How to get pointer of current document to update in updateMany - mongodb

I have a latest mongodb 3.2 and there is a collection of many items that have timeStamp.
A need to convert milliseconds to Date object and now I use this function:
db.myColl.find().forEach(function (doc) {
doc.date = new Date(doc.date);
db.myColl.save(doc);
})
It took very long time to update 2 millions of rows.
I try to use updateMany (seems it is very fast) but how I can get access to a current document? Is there any chance to rewrite the query above by using updateMany?
Thank you.

You can leverage other bulk update APIs like the bulkWrite() method which will allow you to use an iterator to access a document, manipulate it, add the modified document to a list and then send the list of the update operations in a batch to the server for execution.
The following demonstrates this approach, in which you would use the cursor's forEach() method to iterate the colloction and modify the each document at the same time pushing the update operation to a batch of about 1000 documents which can then be updated at once using the bulkWrite() method.
This is as efficient as using the updateMany() since it uses the same underlying bulk write operations:
var cursor = db.myColl.find({"date": { "$exists": true, "$type": 1 }}),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newDate = new Date(doc.date);
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "date": newDate } }
}
});
if (bulkUpdateOps.length == 1000) {
db.myColl.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.myColl.bulkWrite(bulkUpdateOps); }

Current query is the only one solution to set field value by itself or other field value (one could compute some data using more than one field from document).
There is a way to improve performance of that query - when it is executed vis mongo shell directly on server (no data is passed to client).

Related

Spring WebFlux + MongoDB: Tailable Cursor and Aggregation

I´m new with WebFlux and MongoDB. I´m trying to use aggregation in a capped collection with tailable cursor, but I´m nothing getting sucessful.
I´d like to execute this mongoDB query:
db.structures.aggregate(
[
{
$match: {
id: { $in: [8244, 8052]}
}
},
{ $sort: { id: 1, lastUpdate: 1} },
{
$group:
{
_id: {id: "$id"},
lastUpdate: { $last: "$lastUpdate" }
}
}
]
)
ReactiveMongoOperations gives me option to "tail" or "aggregation".
I´m able to execute aggregation:
MatchOperation match = new MatchOperation(Criteria.where("id").in(8244, 8052));
GroupOperation group = Aggregation.group("id", "$id").last("$lastUpdate").as("lastUpdate");
Aggregation aggregate = Aggregation.newAggregation(match, group);
Flux<Structure> result = mongoOperation.aggregate(aggregate,
"structures", Structure.class);
Or tail cursor
Query query = new Query();
query.addCriteria(Criteria.where("id").in(8244, 8052));
Flux<Structure> result = mongoOperation.tail(query, Structure.class);
Is it possible? Tail and Aggregation together?
Using aggregation was the way that I found to get only the last inserted document for each id.
Without aggregation I get:
query without aggregation
With aggregation:
query with aggregation
Tks in advance
The tailable cursor query creates a Flux that never completes (never emits onComplete event) and that Flux emits records as they are inserted in the database. Because of that fact I would think aggregations are not allowed by the database engine with the tailable cursor.
So the aggregation doesn't make sense in a way because on every newly inserted record the aggregation would need to be recomputed. Technically you can do a running aggregation where for every returned record you compute the wanted aggregate record and send it downstream.
One possible solution would be to do the aggregations programmatically on the returned "infinite" Flux:
mongoOperation.tail(query, Structure.class)
.groupBy(Structure::id) // create independent Fluxes based on id
.flatMap(groupedFlux ->
groupedFlux.scan((result, nextStructure) -> { // scan is like reduce but emits intermediate results
log.info("intermediate result is: {}", result);
if (result.getLastUpdate() > nextStructure.getLastUpdate()) {
return result;
} else {
result.setLastUpdate(nextStructure.getLastUpdate());
return result;
}
}));
On the other hand you should probably revisit your use case and what you need to accomplish here and see if something other than capped collection should be used or maybe the aggregation part is redundant (i.e. if newly inserted records always have the lastUpdate property larger then the previous record).

Inserting multiple documents into mongodb using one call in Meteor

In the mongo shell, it is possible to insert an array of documents with one call. In a Meteor project, I have tried using
MyCollection = new Mongo.Collection("my_collection")
documentArray = [{"one": 1}, {"two": 2}]
MyCollection.insert(documentArray)
However, when I check my_collection from the mongo shell, it shows that only one document has been inserted, and that document contains the entire array as if it had been a map:
db.my_collection.find({})
{ "_id" : "KPsbjZt5ALZam4MTd", "0" : { "one" : 1 }, "1" : { "two" : 2} }
Is there a Meteor call that I can use to add a series of documents all at once, or must use a technique such as the one described here?
I imagine that inserting multiple documents in a single call would optimize performance on the client side, where the new documents would become available all at once.
You could use the bulk API to do the bulk insert on the server side. Manipulate the array using the forEach() method and within the loop insert the document using bulk insert operations which are simply abstractions on top of the server to make it easy to build bulk operations.
Note, for older MongoDB servers than 2.6 the API will downconvert the operations. However it's not possible to downconvert 100% so there might be some edge cases where it cannot correctly report the right numbers.
You can get raw access to the collection and database objects in the npm MongoDB driver through rawCollection and rawDatabase methods on Mongo.Collection
MyCollection = new Mongo.Collection("my_collection");
if (Meteor.isServer) {
Meteor.startup(function () {
Meteor.methods({
insertData: function() {
var bulkOp = MyCollection.rawCollection().initializeUnorderedBulkOp(),
counter = 0,
documentArray = [{"one": 1}, {"two": 2}];
documentArray.forEach(function(data) {
bulkOp.insert(data);
counter++;
// Send to server in batch of 1000 insert operations
if (counter % 1000 == 0) {
// Execute per 1000 operations and re-initialize every 1000 update statements
bulkOp.execute(function(e, rresult) {
// do something with result
});
bulkOp = MyCollection.rawCollection().initializeUnorderedBulkOp();
}
});
// Clean up queues
if (counter % 1000 != 0){
bulkOp.execute(function(e, result) {
// do something with result
});
}
}
});
});
}
I'm currently using the mikowals:batch-insert package.
Your code would work then with one small change:
MyCollection = new Mongo.Collection("my_collection");
documentArray = [{"one": 1}, {"two": 2}];
MyCollection.batchInsert(documentArray);
The one drawback of this I've noticed is that it doesn't honor simple-schema.

Mongodb - return an array of _id of all updated documents

I need to update some documents in one collection, and send an array of the _ids of the updated documents to another collection.
Since update() returns the number of updated items not their ids, I've come up with the following to get the array:
var docsUpdated = [];
var cursor = myCollection.find(<myQuery>);
cursor.forEach(function(doc) {
myCollection.update({_id : doc._id}, <myUpdate>, function(error, response){
docsUpdated.push(doc._id);
});
});
Or I could do:
var docsUpdated = myCollection.distinct("_id", <myQuery>);
myCollection.update(<myQuery>, <myUpdate>, {multi : true});
I'm guessing the second version would be faster because it only calls the database twice. But both seem annoyingly inefficient - is there another way of doing this without multiple database calls? Or am I overcomplicating things?
I think you need the cursor operator ".aggregate()"
db.orders.aggregate([
{ $group: { _id: "$_id"} }
])
something along those lines that returns the results of all the id's in the collection

Making a collection from collection subset in mongodb

I have a huge collection of documents (more than two millions) ans I found my self querying very a small subset. using something like
scs = db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}})
which gives less than 5k results. The question is, can I create a new collection with the results of this query?
I'd tried insert:
db.scs.insert(db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}}).toArray())
But it gives me errors: Socket say send() errno:32 Broken pipe 127.0.0.1:27017
I tryied aggregate:
db.balance_sheets.aggregate([{ "9087n":{$gte:40}, "20/58n":{ $lte:40000000}} ,{$out:"pme"}])
And I get "exception: A pipeline stage specification object must contain exactly one field."
Any hints?
Thanks
The first option would be:
var cursor = db.balance_sheets.find({"9087n":{"$gte": 40}, "20/58n":{ $lte:40000000}});
while (cursor.hasNext()) {
var doc = cursor.next();
db.pme.save(doc);
};
As for the aggregation, try
db.balance_sheets.aggregate([
{
"$match": { "9087n": { "$gte": 40 }, "20/58n": { "$lte": 40000000 } }
},
{ "$out": "pme" }
]);
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2 to insert all the documents matching the query from the balance_sheets collection into the pme collection:
var bulk = db.pme.initializeUnorderedBulkOp(),
counter = 0;
db.balance_sheets.find({
"9087n": {"$gte": 40},
"20/58n":{ "$lte":40000000}
}).forEach(function (doc) {
bulk.insert(doc);
counter++;
if (counter % 500 == 0) {
bulk.execute(); // Execute per 500 operations
// and re-initialize every 1000 update statements
bulk = db.pme.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = db.balance_sheets.find({
"9087n": { "$gte": 40 },
"20/58n": { "$lte": 40000000 }
}).map(function (doc) {
return { "insertOne" : { "document": doc } };
});
db.pme.bulkWrite(bulkOps);

How to change date and time in MongoDB? [duplicate]

This question already has answers here:
Update MongoDB field using value of another field
(12 answers)
Closed 5 years ago.
I have the DB with names and dates. I need to change the old date with the date that is +3 days after that. For example oldaDate is 01.02.2015 the new one is 03.02.2015.
I was trying just to put another date for all files, but that mean that all exams are going to be in one day.
$ db.getCollection('school.exam').update( {}, { $set : { "oldDay" : new ISODate("2016-01-11T03:34:54Z") } }, true, true);
The problem is just to replace old date with some random days.
Since MongoDB doesn't yet support the $inc operator to apply on dates (view the JIRA ticket on that here), as an alternative to increment the date field, you would need to iterate the cursor returned by the find() method using the forEach() method, in the loop
get convert the old date field to timestamp, add the number of days in milliseconds to the timestamp and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by adding 3 days to the date field:
var bulk = db.getCollection("school.exam").initializeUnorderedBulkOp(),
counter = 0,
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "oldDay": incDate }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.getCollection('school.exam').initializeUnorderedBulkOp();
}
})
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [],
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "oldDay": incDate } }
}
}
);
})
db.getCollection("school.exam").bulkWrite(bulkOps, { 'ordered': true });