How to: using OpenXML to Update Fields in Word - openxml

I am working on a prototype where I am assembling word documents using OpenXML from various different documents like other word docs.
My question is this: How do you update the fields in those word documents without having word installed.
Fields include numbered sections in each section of the assembled document, updating the table of content field and also formula fields, like =SUM(Above) that should do the work on data inserted into a table during runtime.
These documents will never be opened by end users as word documents since they must be converted to PDF on the server.
Any help would be much appreciated.

Related

Exclude nested field from encryption Mongoose encryption

So i am using an npm package called mongoose-encryption all is great except one important thing. I have an array called reports and it has alot of objects in it. Each object has one unique field called report_id the thing is i need to perform a delete operation based on that ID but if i encrypt it mongoose apparently cant find it. For that i excluded some fields like the docs said but i cant apparently exclude a nested field i tried this:
usersSchema.plugin(encrypt,{secret:sigKey,excludeFromEncryption: ['username','reports.report_id']});
So the username is excluded from encryption but not the reports.report_id
any ideas?
The [document](https://www.npmjs.com/package/mongoose-encryption] says:
To encrypt, the relevant fields are removed from the document, converted to JSON, enciphered in Buffer format with the IV and plugin version prepended, and inserted into the _ct field of the document. Mongoose converts the _ct field to Binary when sending to mongo.
If the reports field is encrypted by mongoose-encryption, it will not be present at all in the document stored in MongoDB.
If you need the value of the reports.report_id field to be accessible in a query while encrypting the reports field, you'll need to copy them out to an array that is not being encrypted.

Spring Mongodb search string inside binary data

I am storing document(text,pdf,csv,doc,docx etc) in mongodb using spring rest.Documents are getting stored as binary data.
Now i want to search documents based on contents inside it.
For e.g. if user searches for string "office", he should see list of documents that contains string "office".
How can i query mongodb for data contains in binary data?
You could try to define a text index over your binary files. I don't know if it would work, but even if it does, such an index would match any words that are part of the file format rather than user content which is generally undesirable.
If I was implementing your requirements I would use a transformer from all of the binary documents to plain text (e.g. pandoc), thus obtaining the user content of each of the documents, then insert that content into a field which has a text index over it, then query on that field.

Autocomplete and text search memory issues in apostrophe-cms: need ideas

I’m having trouble to use the text search and the autocomplete because I have a piece with +87k documents, some of them being big (~3.4MB of text).
I already:
Removed every field from the text index, except title , searchBoost and seoDescription ; these are the only fields copied to highSearchText and the field lowSearchText is always set to an empty string.
Modified the standard text index, including the fields type, published and trash in the beginning of it. I'm also modified the queries to have equality conditions on these fields. The result returned by the command db.aposDocs.stats() shows:
type_1_published_1_trash_1_highSearchText_text_lowSearchText_text_title_text_searchBoost_text: 12201984 (~11 MB, fits nicely in memory)
Verified that this index is being used, both in ‘toDistinc’ query as well in the final ‘toArray’ query.
What I think is the biggest problem
The documents have many repeated words in the title, so if the user types a word present in 5k document titles, the server suffers.
Idea I'm testing
The MongoDB docs says that to improve performance the entire collection must fit in RAM (https://docs.mongodb.com/manual/core/index-text/#storage-requirements-and-performance-costs, last bullet).
So, I created a separate collection named “search” with just the fields highSearchText (string, indexed as text) and highSearchWords (array, also indexed), which result in total size of ~ 19 MB.
By doing the same operations of the standard apostrophe autocomplete in this collection, I achieved much faster, but similar results.
I had to write events to automatically update the search collection when the piece changes, but it seems to work until now.
Issues
I'm testing this search collection with the autocomplete. For the simple text search, I’m just limiting the sorted response to 50 results. Maybe I'll have to use the search collection as well, because the search could still breaks.
Is there some easier approach I'm missing? Please, any ideas are welcome.

Difference between wildcard search and individual text search

Is there a difference between a wildcard search index like $** and text indexes that I create for each of the fields in the collection ?
I do see a small difference in response time when I individually create text indexes. Using individual indexes, returns a better response. I am not able to post an example now, but will try to.
A wildcard text search will index every field that contains string data for each document in the collection (https://docs.mongodb.com/manual/core/index-text/#wildcard-text-indexes).
Because you are essentially increasing the number of fields indexed with a wild card text index, it would take longer to run compared to targeting specific fields for a text index.
Since you can only have one text index per collection (https://docs.mongodb.com/manual/core/index-text/#create-text-index), its worth considering which fields you plan on querying against beforehand.

How do I store tags associated with a document in Solr and retrieve their frequency in aggregate?

I am using Solr to index documents from a wiki. Each document has a unique id, title, body content and some other fields.
Firstly, I wanted to know the declaration in the Solr schema to store the multivalued field "tags" to hold n of these strings, attached to the document. Each document can have a set of tags applied on it.
Second, the use case - how exactly would I retrieve all distinct tags across the entire Solr instance with number of occurences, so that I can build a tag cloud of most popular tags?
thanks
Amit