Control Meteor Mongo Query Batch Size - mongodb

I'm setting up simple pub/sub on a mongo collection in a meteor application.
// On Server
Meteor.publish('records', function (params) {
if (params.gender === "Male") {
params.gender = "M";
} else if (params.gender === "Female") {
params.gender = "F";
}
return Records.find({
gender: params.gender || {$exists: true},
age: {
$lte: params.ageRange.max,
$gte: params.ageRange.min
}
});
});
//On Client
this.computationBlock = Tracker.autorun( () => {
Meteor.subscribe("records",
_.pick(this, ["gender","ageRange"]));
RecordActions.recordsChange(Records.find({}).fetch());
});
These queries occasionally return 10,000+ (total) documents, and the issue is that it returns about 300-500 documents per batch. I have visualizations of aggregated metrics from the query, and so these visualizations often remain in flux for over 10 seconds as each new batch of the query result is brought down from the subscription/publication. I'm pretty sure that if I could configure the batchSize property of the mongo cursor returned from Collection.find(), I could alleviate the problem simply by returning more documents per batch, but I cannot find how to do this in meteor land.
cursor.batchSize is undefined in both the client and server code, although it is simple part of mongo native api (http://docs.mongodb.org/manual/reference/method/cursor.batchSize/). Has anyone had any luck configuring this parameter, or even better, telling meteor to fetch the entire query in "one-shot" as opposed to batching the results?

Related

Bulk.getOperations() in MongoDB Node driver

I'd like to view the results of a bulk operation, specifically to know the IDs of the documents that were updated. I understand that this information is made available through the Bulk.getOperations() method. However, it doesn't appear that this method is available through the MongoDB NodeJS library (at least, the one I'm using).
Could you please let me know if there's something I'm doing wrong here:
const bulk = db.collection('companies').initializeOrderedBulkOp()
const results = getLatestFinanicialResults() // from remote API
results.forEach(result =>
bulk.find({
name: result.companyName,
report: { $ne: result.report }
}).updateOne([
{ $unset: 'prevReport' },
{ $set: { prevReport: '$report' } },
{ $unset: 'report' },
{ $set: { report: result.report } }
]))
await bulk.execute()
await bulk.getOperations() // <-- fails, undefined in Typescript library
I get a static IDE error:
Uncaught TypeError: bulk.getOperations is not a function
I'd like to view the results of a bulk operation, specifically to know the IDs of the documents that were updated
As of currently (MongoDB server v6.x) There is no methods to return IDs for updated documents from a bulk operations (only insert and upsert operations). However, there may be a work around depending on your use case.
The manual that you linked for Bulk.getOperations() is for mongosh, which is a MongoDB Shell application. If you look into the source code for getOperations() in mongosh, it's just a convenient wrapper for batches'. The method batches` returns a list of operations sent for the bulk execution.
As you are utilising ordered bulk operations, MongoDB executes the operations serially. If an error occurs during the processing of one of the write operations, MongoDB will return without processing any remaining write operations in the list.
Depending on the use case, if you modify the bulk.find() part to contain a search for _id for example:
bulk.find({"_id": result._id}).updateOne({$set:{prevReport:"$report"}});
You should be able to see the _id value of the operation in the batches, i.e.
await batch.execute();
console.log(JSON.stringify(batch.batches));
Example output:
{
"originalZeroIndex":0,
"currentIndex":0,
"originalIndexes":[0],
"batchType":2,
"operations":[{"q":{"_id":"634354787d080d3a1e3da51f"},
"u":{"$set":{"prevReport":"$report"}}],
"size":0,
"sizeBytes":0
}
For additional information, you could also retrieve the BulkWriteResult. For example, the getLastOp to retrieve the last operation (in case of a failure)

Spring WebFlux + MongoDB: Tailable Cursor and Aggregation

I´m new with WebFlux and MongoDB. I´m trying to use aggregation in a capped collection with tailable cursor, but I´m nothing getting sucessful.
I´d like to execute this mongoDB query:
db.structures.aggregate(
[
{
$match: {
id: { $in: [8244, 8052]}
}
},
{ $sort: { id: 1, lastUpdate: 1} },
{
$group:
{
_id: {id: "$id"},
lastUpdate: { $last: "$lastUpdate" }
}
}
]
)
ReactiveMongoOperations gives me option to "tail" or "aggregation".
I´m able to execute aggregation:
MatchOperation match = new MatchOperation(Criteria.where("id").in(8244, 8052));
GroupOperation group = Aggregation.group("id", "$id").last("$lastUpdate").as("lastUpdate");
Aggregation aggregate = Aggregation.newAggregation(match, group);
Flux<Structure> result = mongoOperation.aggregate(aggregate,
"structures", Structure.class);
Or tail cursor
Query query = new Query();
query.addCriteria(Criteria.where("id").in(8244, 8052));
Flux<Structure> result = mongoOperation.tail(query, Structure.class);
Is it possible? Tail and Aggregation together?
Using aggregation was the way that I found to get only the last inserted document for each id.
Without aggregation I get:
query without aggregation
With aggregation:
query with aggregation
Tks in advance
The tailable cursor query creates a Flux that never completes (never emits onComplete event) and that Flux emits records as they are inserted in the database. Because of that fact I would think aggregations are not allowed by the database engine with the tailable cursor.
So the aggregation doesn't make sense in a way because on every newly inserted record the aggregation would need to be recomputed. Technically you can do a running aggregation where for every returned record you compute the wanted aggregate record and send it downstream.
One possible solution would be to do the aggregations programmatically on the returned "infinite" Flux:
mongoOperation.tail(query, Structure.class)
.groupBy(Structure::id) // create independent Fluxes based on id
.flatMap(groupedFlux ->
groupedFlux.scan((result, nextStructure) -> { // scan is like reduce but emits intermediate results
log.info("intermediate result is: {}", result);
if (result.getLastUpdate() > nextStructure.getLastUpdate()) {
return result;
} else {
result.setLastUpdate(nextStructure.getLastUpdate());
return result;
}
}));
On the other hand you should probably revisit your use case and what you need to accomplish here and see if something other than capped collection should be used or maybe the aggregation part is redundant (i.e. if newly inserted records always have the lastUpdate property larger then the previous record).

Mongo indexing in Meteor

Not sure I'm understanding indexing mongo queries in Meteor. Right now, none of my queries are indexed. On some of the pages in the app, there are 15 or 20 links that instigate to a unique mongo query. Would each query be indexed individually?
For example, if one of the queries is something like:
Template.myTemplate.helpers({
...
if (weekly === "1") {
var firstWeekers = _.where(progDocs, {Week1: "1"}),
firstWeekNames = firstWeekers.map(function (doc) {
return doc.FullName;
});
return Demographic.find({ FullName: { $in: firstWeekNames }}, { sort: { FullName: 1 }});
}
...
})
How would I implement each of the indexes?
Firstly minimongo (mongo on the client-side) runs in memory so indexing is much less of a factor than on disk. To minimize network consumption you also generally want to keep your collections on the client fairly small making indexing on the client-side even less important.
On the server however indexing can be critical to good performance. There are two common methods to setup indexes on the server:
via the meteor mongo shell, i.e. db.demographic.createIndex( { FullName: 1 } )
via setting the field to be indexed in your schema when using the Collection2 package. See aldeed:schema-index

Inserting multiple documents into mongodb using one call in Meteor

In the mongo shell, it is possible to insert an array of documents with one call. In a Meteor project, I have tried using
MyCollection = new Mongo.Collection("my_collection")
documentArray = [{"one": 1}, {"two": 2}]
MyCollection.insert(documentArray)
However, when I check my_collection from the mongo shell, it shows that only one document has been inserted, and that document contains the entire array as if it had been a map:
db.my_collection.find({})
{ "_id" : "KPsbjZt5ALZam4MTd", "0" : { "one" : 1 }, "1" : { "two" : 2} }
Is there a Meteor call that I can use to add a series of documents all at once, or must use a technique such as the one described here?
I imagine that inserting multiple documents in a single call would optimize performance on the client side, where the new documents would become available all at once.
You could use the bulk API to do the bulk insert on the server side. Manipulate the array using the forEach() method and within the loop insert the document using bulk insert operations which are simply abstractions on top of the server to make it easy to build bulk operations.
Note, for older MongoDB servers than 2.6 the API will downconvert the operations. However it's not possible to downconvert 100% so there might be some edge cases where it cannot correctly report the right numbers.
You can get raw access to the collection and database objects in the npm MongoDB driver through rawCollection and rawDatabase methods on Mongo.Collection
MyCollection = new Mongo.Collection("my_collection");
if (Meteor.isServer) {
Meteor.startup(function () {
Meteor.methods({
insertData: function() {
var bulkOp = MyCollection.rawCollection().initializeUnorderedBulkOp(),
counter = 0,
documentArray = [{"one": 1}, {"two": 2}];
documentArray.forEach(function(data) {
bulkOp.insert(data);
counter++;
// Send to server in batch of 1000 insert operations
if (counter % 1000 == 0) {
// Execute per 1000 operations and re-initialize every 1000 update statements
bulkOp.execute(function(e, rresult) {
// do something with result
});
bulkOp = MyCollection.rawCollection().initializeUnorderedBulkOp();
}
});
// Clean up queues
if (counter % 1000 != 0){
bulkOp.execute(function(e, result) {
// do something with result
});
}
}
});
});
}
I'm currently using the mikowals:batch-insert package.
Your code would work then with one small change:
MyCollection = new Mongo.Collection("my_collection");
documentArray = [{"one": 1}, {"two": 2}];
MyCollection.batchInsert(documentArray);
The one drawback of this I've noticed is that it doesn't honor simple-schema.

Identify last document from MongoDB find() result set

I'm trying to 'stream' data from a node.js/MongoDB instance to the client using websockets. It is all working well.
But how to I identify the last document in the result? I'm using node-mongodb-native to connect to MongoDB from node.js.
A simplified example:
collection.find({}, {}, function(err, cursor) {
if (err) sys.puts(err.message);
cursor.each(function(err, doc) {
client.send(doc);
});
});
Since mongodb objectId contatins creation date you can sort by id, descending and then use limit(1):
db.collection.find().sort( { _id : -1 } ).limit(1);
Note: i am not familiar with node.js at all, above command is mongo shell command and i suppose you can easy rewrite it to node.js.
Say I have companies collection. Below snippet gives me last document in the collection.
db.companies.find({},{"_id":1}).skip(db.companies.find().count()-1);
Code cannot rely on _id as it may not be on a specific pattern always if it's a user defined value.
Use sort and limit, if you want to use cursor :
var last = null;
var findCursor = collection.find({}).cursor();
findCursor.on("data", function(data) {
last = data;
...
});
findCursor.on("end", function(data) {
// last result in last
....
});