Max item count in Cosmos DB trigger - triggers

I'm creating a pre-trigger for a Cosmos DB container. The pre-trigger is supposed to fetch all data related to the triggering document id. The incoming_document.items is always returning 100 when there are more than 100 documents expected (which seems to be limited by the query). I tried to set the pageSize property to -1 in the FeedOptions parameters and to use continuation, but it is still giving me 100. How can I fix this to give the total count?
Here is a simplified version of the code (without the continuation, I used a similar code to here):
function trgAddStats() {
var context = getContext();
var request = context.getRequest();
var incoming_document = request.getBody();
var container = context.getCollection();
var incoming_document.items = 1;
var filterQuery = {
"query": `SELECT t.customer, t.amount FROM Transactions_ds t WHERE t.customer = #customer`,
"parameters": [{
"name": "#customer",
"value": incoming_document.customer
}
]
};
var isAccepted = container.queryDocuments(container.getSelfLink(), filterQuery, {},
function (err, items, responseOptions) {
if (err) throw new Error("Error" + err.message);
incoming_document.items += items.length;
request.setBody(incoming_document);
}
);
if (!isAccepted) throw "Unable to update transaction, abort";
}

For getting more than 100 documents in Cosmos DB we can make use of x-ms-max-item-count.
The maximum number of values that can be returned by the query execution is done by the x-ms-max-item-count header.
The default value of the query results is 100 and it can be configured from 1–1000 using this header.
For more details regarding Pagination of query results in Microsoft Documentation.
You can Customize the number for Items per page in Query Explorer too like here.

Related

Get inserted Ids for Bulk.Insert() -Mongoskin

I am using a mongoskin in my nodeJs applicatipon to insert data in mongo db. I have a requirement to insert array of documents in database and send back the Ids of inserted records to the client. I am able to insert data however unable to locate the Ids of inserted records in the result Object. Need help to locate the insertedIds in the result. Im using the below code to bulk insert.
db.collection('myCollection', function (err, collection) {
var bulk = collection.initializeUnorderedBulkOp();
for (var i = 0; i < dataArray.length; i++) {
bulk.insert(dataArray[i]);
}
bulk.execute(function (err, result) {
//TODO: return the Ids of inserted records to the client
//Client will use these Ids to perform subsequent calls to the nodejs service
});
});
My result is a BatchWriteResult Object type.
Would suggest using the other bulk API method upsert() which will afford you to get in your BatchWriteResult() object the _id values of the inserted documents by calling its getUpsertedIds() method. The result object is in the same format as given in the documentation for BulkWriteResult.
The update operation with the Bulk.find.upsert() option will perform an insert when there are no matching documents for the Bulk.find() condition. If the update document does not specify an _id field, MongoDB adds the _id field and thus you can retrieve the id's of the inserted document
within your BatchWriteResult().
Also, the way you are queing up your bulk insert operations is not usually recommened since this basically builds up in memory; you'd want to have a bit of more control with managing the queues and memory resources other than relying on the driver's default way of limiting the batches of 1000 at a time, as well as the complete batch being under 16MB. The way you can do this is to use the forEach() loop of your data array with a counter that will help limit the batches to 1000 at a time.
The following shows the above approach
function getInsertedIds(result){
var ids = result.getUpsertedIds();
console.log(ids); // an array of upserted ids
return ids;
}
db.collection('myCollection',function(err,collection) {
var bulk = collection.initializeUnorderedBulkOp(),
insertedIds = [],
counter = 0;
dataArray.forEach(function (data){
bulk.find(data).upsert().updateOne(data);
counter++;
if (counter % 1000 == 0) {
bulk.execute(function(err, result) {
insertedIds = getInsertedIds(result);
bulk = collection.initializeUnorderedBulkOp(); // reset after execute
});
}
});
// Clean up the remaining operations in the queue which were
// cut off in the loop - counter not a round divisor of 1000
if (counter % 1000 != 0 ) {
bulk.execute(function(err, result) {
insertedIds = insertedIds.concat(getInsertedIds(result));
console.log(insertedIds);
});
}
});

MongoDB how to delete temporary collection after operations finish

Sometimes I create a temporary collection to aggregate data from multiple collections into a single collection for reporting. I need to drop this temporary collection shortly after a report is created to avoid filling up disk space with temporary collections from many report requests.
Currently I execute this from the application
db.dropCollection('tempCollection564e1f5a4abea9100523ade5');
but the results are not consistent each time it runs. Sometimes the collection drops successfully, but other times the collection fails to drop with this error message:
MongoError: exception: cannot perform operation: a background operation is currently running for collection databaseName.tempCollectionName
code: 12587
What is a best practice for deleting temporary collections in MongoDB? I currently name the collection with a UUID to avoid name collisions, and the collection is only used once before I attempt to destroy the temporary collection.
Is there a way to check if operations are in progress for a collection, and then drop the collection when the operations complete?
note: I do not believe this is an issue with javascript async code in the application. I call the dropCollection() after the aggregation query completes.
I ended up creating this mongoose plugin, and it's been running great in production for over a year. I create a temporary collection, then use setTimeout() to drop the collection 1 minute later. 1 minute is sufficient to query the collection, so the collection is no longer in use.
This creates collections with unique names, such as z_tempCollection_595820e4ae61ecc89635f794, so there is never a name collision.
var mongoose = require('mongoose');
var _ = require('lodash');
var util1 = require(global.appRootPath + '/lib/util1_lib.js');
function tempCollection(persistantSchema){
persistantSchema.statics.resultIntoTempCollection = function (tempCollectionDataArray, options, callback) {
var timestampSeconds = Math.round(Date.now() / 1000);
var tmpCollectionName = 'z_tempCollection_' + (new mongoose.mongo.ObjectId().toString()) + '_' + timestampSeconds;
var registeredModelName = 'tempModel' + tmpCollectionName;
options = options || {};
options.strict = _.isUndefined(options.strict) ? false : options.strict;
options.schema = _.isUndefined(options.schema) ? {} : options.schema;
var tmpSchema = new mongoose.Schema(options.schema, {strict: options.strict, collection: tmpCollectionName});
tmpSchema.statics.removeTempCollection = function(tempModel){
var maxRemovalAttempts = 3;
delete mongoose.models[registeredModelName];
delete mongoose.modelSchemas[registeredModelName];
setTimeout(function(){
mongoose.connection.db.dropCollection(tmpCollectionName, function (err, result) {
if (err) {
util1.saveError(err, 'server', null);
}
});
}, 60 * 1000);
}
// tempModel variable ref is overwritten on each subsequent run of resultIntoTempCollection
var tempModel = mongoose.model(registeredModelName, tmpSchema);
var promises = [];
tempCollectionDataArray.forEach(function(doc){
promises.push(new tempModel(doc).save());
});
return Promise.all(promises).then(function(){
return tempModel;
});
}
}
module.exports = tempCollection;

Average Aggregation Queries in Meteor

Ok, still in my toy app, I want to find out the average mileage on a group of car owners' odometers. This is pretty easy on the client but doesn't scale. Right? But on the server, I don't exactly see how to accomplish it.
Questions:
How do you implement something on the server then use it on the client?
How do you use the $avg aggregation function of mongo to leverage its optimized aggregation function?
Or alternatively to (2) how do you do a map/reduce on the server and make it available to the client?
The suggestion by #HubertOG was to use Meteor.call, which makes sense and I did this:
# Client side
Template.mileage.average_miles = ->
answer = null
Meteor.call "average_mileage", (error, result) ->
console.log "got average mileage result #{result}"
answer = result
console.log "but wait, answer = #{answer}"
answer
# Server side
Meteor.methods average_mileage: ->
console.log "server mileage called"
total = count = 0
r = Mileage.find({}).forEach (mileage) ->
total += mileage.mileage
count += 1
console.log "server about to return #{total / count}"
total / count
That would seem to work fine, but it doesn't because as near as I can tell Meteor.call is an asynchronous call and answer will always be a null return. Handling stuff on the server seems like a common enough use case that I must have just overlooked something. What would that be?
Thanks!
As of Meteor 0.6.5, the collection API doesn't support aggregation queries yet because there's no (straightforward) way to do live updates on them. However, you can still write them yourself, and make them available in a Meteor.publish, although the result will be static. In my opinion, doing it this way is still preferable because you can merge multiple aggregations and use the client-side collection API.
Meteor.publish("someAggregation", function (args) {
var sub = this;
// This works for Meteor 0.6.5
var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db;
// Your arguments to Mongo's aggregation. Make these however you want.
var pipeline = [
{ $match: doSomethingWith(args) },
{ $group: {
_id: whatWeAreGroupingWith(args),
count: { $sum: 1 }
}}
];
db.collection("server_collection_name").aggregate(
pipeline,
// Need to wrap the callback so it gets called in a Fiber.
Meteor.bindEnvironment(
function(err, result) {
// Add each of the results to the subscription.
_.each(result, function(e) {
// Generate a random disposable id for aggregated documents
sub.added("client_collection_name", Random.id(), {
key: e._id.somethingOfInterest,
count: e.count
});
});
sub.ready();
},
function(error) {
Meteor._debug( "Error doing aggregation: " + error);
}
)
);
});
The above is an example grouping/count aggregation. Some things of note:
When you do this, you'll naturally be doing an aggregation on server_collection_name and pushing the results to a different collection called client_collection_name.
This subscription isn't going to be live, and will probably be updated whenever the arguments change, so we use a really simple loop that just pushes all the results out.
The results of the aggregation don't have Mongo ObjectIDs, so we generate some arbitrary ones of our own.
The callback to the aggregation needs to be wrapped in a Fiber. I use Meteor.bindEnvironment here but one can also use a Future for more low-level control.
If you start combining the results of publications like these, you'll need to carefully consider how the randomly generated ids impact the merge box. However, a straightforward implementation of this is just a standard database query, except it is more convenient to use with Meteor APIs client-side.
TL;DR version: Almost anytime you are pushing data out from the server, a publish is preferable to a method.
For more information about different ways to do aggregation, check out this post.
I did this with the 'aggregate' method. (ver 0.7.x)
if(Meteor.isServer){
Future = Npm.require('fibers/future');
Meteor.methods({
'aggregate' : function(param){
var fut = new Future();
MongoInternals.defaultRemoteCollectionDriver().mongo._getCollection(param.collection).aggregate(param.pipe,function(err, result){
fut.return(result);
});
return fut.wait();
}
,'test':function(param){
var _param = {
pipe : [
{ $unwind:'$data' },
{ $match:{
'data.y':"2031",
'data.m':'01',
'data.d':'01'
}},
{ $project : {
'_id':0
,'project_id' : "$project_id"
,'idx' : "$data.idx"
,'y' : '$data.y'
,'m' : '$data.m'
,'d' : '$data.d'
}}
],
collection:"yourCollection"
}
Meteor.call('aggregate',_param);
}
});
}
If you want reactivity, use Meteor.publish instead of Meteor.call. There's an example in the docs where they publish the number of messages in a given room (just above the documentation for this.userId), you should be able to do something similar.
You can use Meteor.methods for that.
// server
Meteor.methods({
average: function() {
...
return something;
},
});
// client
var _avg = { /* Create an object to store value and dependency */
dep: new Deps.Dependency();
};
Template.mileage.rendered = function() {
_avg.init = true;
};
Template.mileage.averageMiles = function() {
_avg.dep.depend(); /* Make the function rerun when _avg.dep is touched */
if(_avg.init) { /* Fetch the value from the server if not yet done */
_avg.init = false;
Meteor.call('average', function(error, result) {
_avg.val = result;
_avg.dep.changed(); /* Rerun the helper */
});
}
return _avg.val;
});

Are DBRefs supported in Meteor yet? [duplicate]

I'm using meteor 0.3.7 in Win7(32) and trying to create a simple logging system using 2 MongoDB collections to store data that are linked by DBRef.
The current pseudo schema is :
Users {
username : String,
password : String,
created : Timestamp,
}
Logs {
user_id : DBRef {$id, $ref}
message : String
}
I use server methods to insert the logs so I can do some upserts on the clients collection.
Now I want to do an old "left join" and display a list of the last n logs with the embedded User name.
I don't want to embed the Logs in Users because the most used operation is getting the last n logs. Embedding in my opinion was going to have a big impact in performance.
What is the best approach to achieve this?
Next it was great if possible to edit the User name and all items change theis name
Regards
Playing around with Cursor.observe answered my question. It may not be the most effective way of doing this, but solves my future problems of derefering DBRefs "links"
So for the server we need to publish a special collection. One that can enumerate the cursor and for each document search for the corresponding DBRef.
Bare in mind this implementation is hardcoded and should be done as a package like UnRefCollection.
Server Side
CC.Logs = new Meteor.Collection("logs");
CC.Users = new Meteor.Collection("users");
Meteor.publish('logsAndUsers', function (page, size) {
var self = this;
var startup = true;
var startupList = [], uniqArr = [];
page = page || 1;
size = size || 100;
var skip = (page - 1) * size;
var cursor = CC.Logs.find({}, {limit : size, skip : skip});
var handle = cursor.observe({
added : function(doc, idx){
var clone = _.clone(doc);
var refId = clone.user_id.oid; // showld search DBRefs
if (startup){
startupList.push(clone);
if (!_.contains(uniqArr, refId))
uniqArr.push(refId);
} else {
// Clients added logs
var deref = CC.Users.findOne({_id : refid});
clone.user = deref;
self.set('logsAndUsers', clone._id, clone);
self.flush();
}
},
removed : function(doc, idx){
self.unset('logsAndUsers', doc._id, _.keys(doc));
self.flush();
},
changed : function(new_document, idx, old_document){
var set = {};
_.each(new_document, function (v, k) {
if (!_.isEqual(v, old_document[k]))
set[k] = v;
});
self.set('logsAndUsers', new_document._id, set);
var dead_keys = _.difference(_.keys(old_document), _.keys(new_document));
self.unset('logsAndUsers', new_document._id, dead_keys);
self.flush();
},
moved : function(document, old_index, new_index){
// Not used
}
});
self.onStop(function(){
handle.stop();
});
// Deref on first Run
var derefs = CC.Users.find({_id : {$in : uniqArr} }).fetch();
_.forEach(startupList, function (item){
_.forEach(derefs, function(ditems){
if (item["user_id"].oid === ditems._id){
item.user = ditems;
return false;
}
});
self.set('logsAndUsers', item._id, item);
});
delete derefs; // Not needed anymore
startup = false;
self.complete();
self.flush();
});
For each added logs document it'll search the users collection and try to add to the logs collection the missing information.
The added function is called for each document in the logs collection in the first run I created a startupList and an array of unique users ids so for the first run it'll query the db only once. Its a good idea to put a paging mechanism to speed up things.
Client Side
On the client, subscribe to the logsAndUsers collection, if you want to make changes do it directly to the Logs collection.
LogsAndUsers = new Meteor.collection('logsAndUser');
Logs = new Meteor.colection('logs'); // Changes here are observed in the LogsAndUsers collection
Meteor.autosubscribe(function () {
var page = Session.get('page') || 1;
Meteor.subscribe('logsAndUsers', page);
});
Why not just also store the username in the logs collection as well?
Then you can query on them directly without needing any kind of "join"
If for some reason you need to be able to handle that username change, you just fetch the user object by name, then query on Logs with { user_id : user._id }

Meteor and DBRefs

I'm using meteor 0.3.7 in Win7(32) and trying to create a simple logging system using 2 MongoDB collections to store data that are linked by DBRef.
The current pseudo schema is :
Users {
username : String,
password : String,
created : Timestamp,
}
Logs {
user_id : DBRef {$id, $ref}
message : String
}
I use server methods to insert the logs so I can do some upserts on the clients collection.
Now I want to do an old "left join" and display a list of the last n logs with the embedded User name.
I don't want to embed the Logs in Users because the most used operation is getting the last n logs. Embedding in my opinion was going to have a big impact in performance.
What is the best approach to achieve this?
Next it was great if possible to edit the User name and all items change theis name
Regards
Playing around with Cursor.observe answered my question. It may not be the most effective way of doing this, but solves my future problems of derefering DBRefs "links"
So for the server we need to publish a special collection. One that can enumerate the cursor and for each document search for the corresponding DBRef.
Bare in mind this implementation is hardcoded and should be done as a package like UnRefCollection.
Server Side
CC.Logs = new Meteor.Collection("logs");
CC.Users = new Meteor.Collection("users");
Meteor.publish('logsAndUsers', function (page, size) {
var self = this;
var startup = true;
var startupList = [], uniqArr = [];
page = page || 1;
size = size || 100;
var skip = (page - 1) * size;
var cursor = CC.Logs.find({}, {limit : size, skip : skip});
var handle = cursor.observe({
added : function(doc, idx){
var clone = _.clone(doc);
var refId = clone.user_id.oid; // showld search DBRefs
if (startup){
startupList.push(clone);
if (!_.contains(uniqArr, refId))
uniqArr.push(refId);
} else {
// Clients added logs
var deref = CC.Users.findOne({_id : refid});
clone.user = deref;
self.set('logsAndUsers', clone._id, clone);
self.flush();
}
},
removed : function(doc, idx){
self.unset('logsAndUsers', doc._id, _.keys(doc));
self.flush();
},
changed : function(new_document, idx, old_document){
var set = {};
_.each(new_document, function (v, k) {
if (!_.isEqual(v, old_document[k]))
set[k] = v;
});
self.set('logsAndUsers', new_document._id, set);
var dead_keys = _.difference(_.keys(old_document), _.keys(new_document));
self.unset('logsAndUsers', new_document._id, dead_keys);
self.flush();
},
moved : function(document, old_index, new_index){
// Not used
}
});
self.onStop(function(){
handle.stop();
});
// Deref on first Run
var derefs = CC.Users.find({_id : {$in : uniqArr} }).fetch();
_.forEach(startupList, function (item){
_.forEach(derefs, function(ditems){
if (item["user_id"].oid === ditems._id){
item.user = ditems;
return false;
}
});
self.set('logsAndUsers', item._id, item);
});
delete derefs; // Not needed anymore
startup = false;
self.complete();
self.flush();
});
For each added logs document it'll search the users collection and try to add to the logs collection the missing information.
The added function is called for each document in the logs collection in the first run I created a startupList and an array of unique users ids so for the first run it'll query the db only once. Its a good idea to put a paging mechanism to speed up things.
Client Side
On the client, subscribe to the logsAndUsers collection, if you want to make changes do it directly to the Logs collection.
LogsAndUsers = new Meteor.collection('logsAndUser');
Logs = new Meteor.colection('logs'); // Changes here are observed in the LogsAndUsers collection
Meteor.autosubscribe(function () {
var page = Session.get('page') || 1;
Meteor.subscribe('logsAndUsers', page);
});
Why not just also store the username in the logs collection as well?
Then you can query on them directly without needing any kind of "join"
If for some reason you need to be able to handle that username change, you just fetch the user object by name, then query on Logs with { user_id : user._id }