Inserting multiple documents into mongodb using one call in Meteor - mongodb

In the mongo shell, it is possible to insert an array of documents with one call. In a Meteor project, I have tried using
MyCollection = new Mongo.Collection("my_collection")
documentArray = [{"one": 1}, {"two": 2}]
MyCollection.insert(documentArray)
However, when I check my_collection from the mongo shell, it shows that only one document has been inserted, and that document contains the entire array as if it had been a map:
db.my_collection.find({})
{ "_id" : "KPsbjZt5ALZam4MTd", "0" : { "one" : 1 }, "1" : { "two" : 2} }
Is there a Meteor call that I can use to add a series of documents all at once, or must use a technique such as the one described here?
I imagine that inserting multiple documents in a single call would optimize performance on the client side, where the new documents would become available all at once.

You could use the bulk API to do the bulk insert on the server side. Manipulate the array using the forEach() method and within the loop insert the document using bulk insert operations which are simply abstractions on top of the server to make it easy to build bulk operations.
Note, for older MongoDB servers than 2.6 the API will downconvert the operations. However it's not possible to downconvert 100% so there might be some edge cases where it cannot correctly report the right numbers.
You can get raw access to the collection and database objects in the npm MongoDB driver through rawCollection and rawDatabase methods on Mongo.Collection
MyCollection = new Mongo.Collection("my_collection");
if (Meteor.isServer) {
Meteor.startup(function () {
Meteor.methods({
insertData: function() {
var bulkOp = MyCollection.rawCollection().initializeUnorderedBulkOp(),
counter = 0,
documentArray = [{"one": 1}, {"two": 2}];
documentArray.forEach(function(data) {
bulkOp.insert(data);
counter++;
// Send to server in batch of 1000 insert operations
if (counter % 1000 == 0) {
// Execute per 1000 operations and re-initialize every 1000 update statements
bulkOp.execute(function(e, rresult) {
// do something with result
});
bulkOp = MyCollection.rawCollection().initializeUnorderedBulkOp();
}
});
// Clean up queues
if (counter % 1000 != 0){
bulkOp.execute(function(e, result) {
// do something with result
});
}
}
});
});
}

I'm currently using the mikowals:batch-insert package.
Your code would work then with one small change:
MyCollection = new Mongo.Collection("my_collection");
documentArray = [{"one": 1}, {"two": 2}];
MyCollection.batchInsert(documentArray);
The one drawback of this I've noticed is that it doesn't honor simple-schema.

Related

Update join in mongoDB. Is it possible?

Given these three documents:
db.test.save({"_id":1, "foo":"bar1", "xKey": "xVal1"});
db.test.save({"_id":2, "foo":"bar2", "xKey": "xVal2"});
db.test.save({"_id":3, "foo":"bar3", "xKey": "xVal3"});
And a separate array of information that references those documents:
[{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}]
Is it possible to update "foo" on the two referenced documents (1 and 2) in a single operation?
I know I can loop through the array and do them one by one but I have thousands of documents, which means too many round trips to the server.
Many thanks for your thoughts.
It's not possible to update "foo" on the two referenced documents (1 and 2) in a single atomic operation as MongoDB has no such mechanism. However, seeing that you have a large collection, one option is to take advantage of the Bulk API which allows you to send your updates in batches instead of every update request to the server.
The process involves looping all matched documents within the array and process Bulk updates which will at least allow many operations to be sent in a single request with a singular response.
This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.
-EDIT-
The reason of choosing a lower value is generally a controlled choice. As noted in the documentation there, MongoDB by default will send to the server in batches of 1000 operations at a time at maximum and there is no guarantee that makes sure that these default 1000 operations requests actually fit under the 16MB BSON limit. So you would still need to be on the "safe" side and impose a lower batch size that you can only effectively manage so that it totals less than the data limit in size when sending to the server.
Let's use an example to demonstrate the approaches above:
a) If using MongoDB v3.0 or below:
var bulk = db.test.initializeOrderedBulkOp(),
largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
counter = 0;
largeArray.forEach(doc) {
bulk.find({ "_id": doc._id }).updateOne({ "$set": { "foo": doc.foo } });
counter++;
if (counter % 500 == 0) {
bulk.execute();
bulk = db.test.initializeOrderedBulkOp();
}
}
if (counter % 500 != 0 ) bulk.execute();
b) If using MongoDB v3.2.X or above (the new MongoDB version 3.2 has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite()):
var largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
bulkUpdateOps = [];
largeArray.forEach(function(doc){
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "foo": doc.foo } }
}
});
if (bulkUpdateOps.length === 500) {
db.test.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) db.test.bulkWrite(bulkUpdateOps);

How to get pointer of current document to update in updateMany

I have a latest mongodb 3.2 and there is a collection of many items that have timeStamp.
A need to convert milliseconds to Date object and now I use this function:
db.myColl.find().forEach(function (doc) {
doc.date = new Date(doc.date);
db.myColl.save(doc);
})
It took very long time to update 2 millions of rows.
I try to use updateMany (seems it is very fast) but how I can get access to a current document? Is there any chance to rewrite the query above by using updateMany?
Thank you.
You can leverage other bulk update APIs like the bulkWrite() method which will allow you to use an iterator to access a document, manipulate it, add the modified document to a list and then send the list of the update operations in a batch to the server for execution.
The following demonstrates this approach, in which you would use the cursor's forEach() method to iterate the colloction and modify the each document at the same time pushing the update operation to a batch of about 1000 documents which can then be updated at once using the bulkWrite() method.
This is as efficient as using the updateMany() since it uses the same underlying bulk write operations:
var cursor = db.myColl.find({"date": { "$exists": true, "$type": 1 }}),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newDate = new Date(doc.date);
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "date": newDate } }
}
});
if (bulkUpdateOps.length == 1000) {
db.myColl.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.myColl.bulkWrite(bulkUpdateOps); }
Current query is the only one solution to set field value by itself or other field value (one could compute some data using more than one field from document).
There is a way to improve performance of that query - when it is executed vis mongo shell directly on server (no data is passed to client).

Get the count of Fields in each document through query using MongoDB java driver

Is it possible to get the number of fields in document using a Query From MongoDB java Driver
Example:
Document1 :{_id:1,"type1": 10 "type2":30,"ABC":123,"DEF":345}
Document2 :{_id:2,"type2":30,"ABC":123,"DEF":345}
Note: In second document "type1" key doesnt exist .
When i project
Is it possible to project only "type1" and "type2" and get number of fields existing in that document.
With current code i am getting all the documents and individually searching if there is the key i am looking is present in the whole cursor:
The code snipped is as follows:
MongoClient mcl=new MongoClient();
MongoDatabase mdb=mcl.getDatabase("test");
MongoCollection mcol=mdb.getCollection("testcol");
FindIterable<Document> findIterable = mcol.find();
MongoCursor<Document> cursor = findIterable.iterator();
//Here am having 120 types checking if each type is present..
while(cursor.hasNext())
{
Document doc=cursor.next();
int numberOfTypes=0;
for(int i=1;i<=120;i++)
{
if(doc.containsKey("type"+i))
{
numberOfTypes++;
}
}
System.out.println("*********************************************");
System.out.println("_id"+doc.get("_id"));
System.out.println("Number of Types in this document are "+numberOfTypes);
System.out.println("*********************************************");
}
}
This code is working if the records are less it wont be over load to the application ..Suppose there are 5000000 with each Document containing 120 types , the application is crashing as there would be more garbage collection involved as for every iteration we are creating a document.Is there any other approach through which we can achieve the above stated functionality.
From your java code I read
project only "type1" and "type2" and get number of fields existing in that document.
as
project only type[1..120] fields and number of such fields in the document
With this assumption, you can map-reduce it as following:
db.testcol.mapReduce(
function(){
value = {count:0};
for (i = 1; i <= 120; i++) {
key = "type" + i
if (this.hasOwnProperty(key)) {
value[key] = this[key];
value.count++
}
}
if (value.count > 0) {
emit(this._id, value);
}
},
function(){
//nothing to reduce
},
{
out:{inline:true}
});
out:{inline:true} works for small datasets, when result fits into 16Mb limit. For larger responses you need to output to a collection, which you can query and iterate as usual.

Control Meteor Mongo Query Batch Size

I'm setting up simple pub/sub on a mongo collection in a meteor application.
// On Server
Meteor.publish('records', function (params) {
if (params.gender === "Male") {
params.gender = "M";
} else if (params.gender === "Female") {
params.gender = "F";
}
return Records.find({
gender: params.gender || {$exists: true},
age: {
$lte: params.ageRange.max,
$gte: params.ageRange.min
}
});
});
//On Client
this.computationBlock = Tracker.autorun( () => {
Meteor.subscribe("records",
_.pick(this, ["gender","ageRange"]));
RecordActions.recordsChange(Records.find({}).fetch());
});
These queries occasionally return 10,000+ (total) documents, and the issue is that it returns about 300-500 documents per batch. I have visualizations of aggregated metrics from the query, and so these visualizations often remain in flux for over 10 seconds as each new batch of the query result is brought down from the subscription/publication. I'm pretty sure that if I could configure the batchSize property of the mongo cursor returned from Collection.find(), I could alleviate the problem simply by returning more documents per batch, but I cannot find how to do this in meteor land.
cursor.batchSize is undefined in both the client and server code, although it is simple part of mongo native api (http://docs.mongodb.org/manual/reference/method/cursor.batchSize/). Has anyone had any luck configuring this parameter, or even better, telling meteor to fetch the entire query in "one-shot" as opposed to batching the results?

mongodb find and update as once with function

I ask myself if there is a way to use the aggregation framework or mapreduce of mongodb to perform data calculations and directly update a document field with the result. For example the document would look like this:
{
a: 1,
b: 2,
result: undefined
}
Now I would like to perform a process on each document of the collection and sum a + b of each document and finally store the result within the document like so:
{
a: 1,
b: 2,
result: 3
}
Maybe it works with mapreduce but I don't get it how it would work to update the document with mapreduce. Any suggestion?
I suggest mongo bulk update instead of map-reduce as below :
var bulk = db.collectionName.initializeOrderedBulkOp();
var counter = 0;
db.collectionName.find().forEach(function(data) {
var updoc = {
"$set": {}
};
var myKey = "result";
updoc["$set"][myKey] = data.a + data.b
// queue the update
bulk.find({
"_id": data._id
}).update(updoc);
counter++;
// Drain and re-initialize every 1000 update statements
if(counter % 1000 == 0) {
bulk.execute();
bulk = db.collectionName.initializeOrderedBulkOp();
}
})
// Add the rest in the queue
if(counter % 1000 != 0) bulk.execute();
You can't refer to the document itself in an update. You have to iterate/loop through the documents and read the required values and manipulate as per requirement and update each document using a function. See this answer for an example, or this one for server-side eval().
db.eval(function() {
db.collection.find({"add criteria if required"}).forEach(function(document) {
document.result = document.a + document.b;
db.collection.save(document);
});
});
More details