How to bulk copy one field to a first object of array and update the document in MongoDB? - mongodb

I want to copy price information in my document to prices[] array.
var entitiesCol = db.getCollection('entities');
entitiesCol.find({"type": "item"}).forEach(function(item){
item.prices = [ {
"value": item.price
}];
entitiesCol.save(item);
});
It takes too long time and some fields are not updated.
I am using Mongoose in server side and I can also use it.
What can I do for that?

In the mongo shell, you can use the bulkWrite() method to carry out the updates in a fast and efficient manner. Consider the following example:
var entitiesCol = db.getCollection('entities'),
counter = 0,
ops = [];
entitiesCol.find({
"type": "item",
"prices.0": { "$exists": false }
}).snapshot().forEach(function(item){
ops.push({
"updateOne": {
"filter": { "_id": item._id },
"update": {
"$push": {
"prices": { "value": item.price }
}
}
}
});
counter++;
if (counter % 500 === 0) {
entitiesCol.bulkWrite(ops);
ops = [];
}
})
if (counter % 500 !== 0)
entitiesCol.bulkWrite(ops);
The counter variable above is there to manage your bulk updates effectively if your collection is large. It allows you to batch the update operations and sends the writes to the server in batches of 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
For bulk operations MongoDB imposes a default internal limit of 1000 operations per batch and so the choice of 500 documents is good in the sense that you have some control over the batch size rather than let MongoDB impose the default, i.e. for larger operations in the magnitude of > 1000 documents.

Related

Compare Time to Document Interval and Update

Use Case:
I've got a mongodb collection with a couple million documents. Documents in this
collection must be updated sometimes. Therefore I've setup a monitorFrequency field which would define the that a specific document must be updated every 6, 12, 24 or 720 hours. Additionally I setup a field called lastRefreshAt which is a timestamp of the last actual update.
The problem:
How can I select all documents from my collection profiles which need to be refreshed again (because monitorFrequency is older than lastRefreshAt).
Should I run that on a single query which would only return those documents which need to be refreshed again or should I rather iterate on all documents with a cursor and check in my node application if the document needs to be refreshed or not?
I would know how to do approach #2, but I am not sure what approach to chose and how the query for #1 would look like.
There are a couple of approaches depending on available architecture and choices. Some are good choices and some are bad, but we might as well explain them all.
Use $where with multi-update
As a first option to examine, you could use $where to calculate the difference for selection and feed directly to .update() or .updateMany() for that matter:
db.profiles.update(
{
"$where": function() {
return (Date.now() - this.lastRefreshAt.valueOf())
> ( this.monitorFrequency * 1000 * 60 * 60 );
}
},
{ "$currentDate": { "lastRefreshAt": true } },
{ "multi": true }
)
Which pretty simply works out the milliseconds difference between the current "lastRefreshAt" value and the current Date value and compares that to the stored "monitorFrequency" converted into milliseconds itself.
The $currentDate is appplied because it is a "multi" update and applied to all matched documents, so this ensures the "server timestamp" at the actual time of document update is applied to the document.
It's not fantastic as it does require a full collection scan in order to select the documents via calculation and thus cannot use an index. Plus it's JavaScript evaluation, which not being native code does add some overhead.
Loop the matched selection
So JavaScript is not that great a selection option in general when other options apply. Instead try using the aggregation framework for the calculation and loop the cursor result:
var ops = [];
db.profiles.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$subtract": [new Date(), "$lastRefreshAt"] },
{ "$multiply": ["$monitorFrequency", 1000 * 60 * 60] }
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
]).forEach(doc => {
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$currentDate": { "lastRefreshAt": true } }
}
});
if ( ops.length > 1000 ) {
db.profiles.bulkWrite(ops);
ops = [];
}
})
if ( ops.length > 0 ) {
db.profiles.bulkWrite(ops);
ops = [];
}
So again that's a collection scan due to the calculation but it is done with native operators, so that part at least should be a bit faster. Also from a technical standpoint it's a little different because the new Date() is actually established at the time of request and not per document iterated as it would be using $where. Lacking an operator to produce the "current date" internally, there is no way for the aggregation framework to do this per iteration.
And of course, instead of just applying our "update" expression as it matches documents, we are looping the result cursor and applying a function. So whilst there are "some" gains, there is also additional overhead. Mileage may vary as to performance and practicality.
Parallel Updates
Personally I would do neither of the above and simply run a query selecting each marked "monitorFrequency" and looking for the dates between the boundaries that exceed the allowed difference.
As a simple example using NodeJS to implement Promise.all() for parallel calls:
const MongoClient = require('mongodb').MongoClient;
const onHour = 1000 * 60 * 60;
(async function() {
let db;
try {
db = await MongoClient.connect('mongodb://localhost/test');
let collection = db.collection('profiles');
let intervals = [6, 12, 24, 720];
let snapDate = new Date();
await Promise.all(
intervals.map( (monitorFrequency,i) =>
collection.updateMany(
{
monitorFrequency,
"lastRefreshAt": Object.assign(
{ "$lt": new Date(snapDate.valueOf() - intervals[i] * oneHour) },
(i < intervals.length) ?
{ "$gt": new Date(snapDate.valueOf() - intervals[i+1] * oneHour) }
: {}
)
},
{ "$currentDate": { "lastRefreshAt": true } },
)
)
);
} catch(e) {
console.error(e);
} finally {
db.close();
}
})();
This would allow you to index on the two fields and allow optimal selection, and since the "date ranges" are paired to their calculated difference from "monitorFrequency" then those documents that "require refresh" are the only ones that get selected for update.
Gievn the finite number of possible intervals this is what I would suspect to be the most optimal solution. But the construction along with the fact that the actual "update" portion remains consistent for each selection leads to one other option.
Use $or for each selection.
Much the same logic as above, but instead applied to build an $or condition for the "query" portion of a "single" update. It is an "array of criteria" afterall, which is essentially the same as an "array of queries" which is what we are doing above. So just turn it around a little:
let intervals = [6, 12, 24, 720];
let snapDate = new Date();
db.profiles.updateMany(
{
"$or": intervals.map( (monitorFrequency,i) =>
({
monitorFrequency,
"lastRefreshAt": Object.assign(
{ "$lt": new Date(snapDate.valueOf() - intervals[i] * oneHour) },
(i < intervals.length) ?
{ "$gt": new Date(snapDate.valueOf() - intervals[i+1] * oneHour) }
: {}
)
})
)
},
{ "$currentDate": { "lastRefreshAt": true } }
)
This then becomes one simple statement and of course can actually use indexes where available. Generally this is what you should be doing, though as I have suggested my intuition tells me that 4 threads of execution constrained only by the slowest one gets the job done slightly faster. Again, mileage may vary on that but logic dictates that this is so.
So the basic lesson here is "whilst you may think" that the logical approach is to calculate the values and compare within the database itself, it's actually the worst possible thing you can do for query performance.
The simple approach taken are to work out the criteria that should select the documents you want "before" you issue the query statement to the server. This means you are looking at "concrete values" rather than "calculation results" in comparison. And "concrete values" can actually be indexed, which is generally what you want for database queries.

Update join in mongoDB. Is it possible?

Given these three documents:
db.test.save({"_id":1, "foo":"bar1", "xKey": "xVal1"});
db.test.save({"_id":2, "foo":"bar2", "xKey": "xVal2"});
db.test.save({"_id":3, "foo":"bar3", "xKey": "xVal3"});
And a separate array of information that references those documents:
[{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}]
Is it possible to update "foo" on the two referenced documents (1 and 2) in a single operation?
I know I can loop through the array and do them one by one but I have thousands of documents, which means too many round trips to the server.
Many thanks for your thoughts.
It's not possible to update "foo" on the two referenced documents (1 and 2) in a single atomic operation as MongoDB has no such mechanism. However, seeing that you have a large collection, one option is to take advantage of the Bulk API which allows you to send your updates in batches instead of every update request to the server.
The process involves looping all matched documents within the array and process Bulk updates which will at least allow many operations to be sent in a single request with a singular response.
This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.
-EDIT-
The reason of choosing a lower value is generally a controlled choice. As noted in the documentation there, MongoDB by default will send to the server in batches of 1000 operations at a time at maximum and there is no guarantee that makes sure that these default 1000 operations requests actually fit under the 16MB BSON limit. So you would still need to be on the "safe" side and impose a lower batch size that you can only effectively manage so that it totals less than the data limit in size when sending to the server.
Let's use an example to demonstrate the approaches above:
a) If using MongoDB v3.0 or below:
var bulk = db.test.initializeOrderedBulkOp(),
largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
counter = 0;
largeArray.forEach(doc) {
bulk.find({ "_id": doc._id }).updateOne({ "$set": { "foo": doc.foo } });
counter++;
if (counter % 500 == 0) {
bulk.execute();
bulk = db.test.initializeOrderedBulkOp();
}
}
if (counter % 500 != 0 ) bulk.execute();
b) If using MongoDB v3.2.X or above (the new MongoDB version 3.2 has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite()):
var largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
bulkUpdateOps = [];
largeArray.forEach(function(doc){
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "foo": doc.foo } }
}
});
if (bulkUpdateOps.length === 500) {
db.test.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) db.test.bulkWrite(bulkUpdateOps);

How to get pointer of current document to update in updateMany

I have a latest mongodb 3.2 and there is a collection of many items that have timeStamp.
A need to convert milliseconds to Date object and now I use this function:
db.myColl.find().forEach(function (doc) {
doc.date = new Date(doc.date);
db.myColl.save(doc);
})
It took very long time to update 2 millions of rows.
I try to use updateMany (seems it is very fast) but how I can get access to a current document? Is there any chance to rewrite the query above by using updateMany?
Thank you.
You can leverage other bulk update APIs like the bulkWrite() method which will allow you to use an iterator to access a document, manipulate it, add the modified document to a list and then send the list of the update operations in a batch to the server for execution.
The following demonstrates this approach, in which you would use the cursor's forEach() method to iterate the colloction and modify the each document at the same time pushing the update operation to a batch of about 1000 documents which can then be updated at once using the bulkWrite() method.
This is as efficient as using the updateMany() since it uses the same underlying bulk write operations:
var cursor = db.myColl.find({"date": { "$exists": true, "$type": 1 }}),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newDate = new Date(doc.date);
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "date": newDate } }
}
});
if (bulkUpdateOps.length == 1000) {
db.myColl.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.myColl.bulkWrite(bulkUpdateOps); }
Current query is the only one solution to set field value by itself or other field value (one could compute some data using more than one field from document).
There is a way to improve performance of that query - when it is executed vis mongo shell directly on server (no data is passed to client).

Making a collection from collection subset in mongodb

I have a huge collection of documents (more than two millions) ans I found my self querying very a small subset. using something like
scs = db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}})
which gives less than 5k results. The question is, can I create a new collection with the results of this query?
I'd tried insert:
db.scs.insert(db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}}).toArray())
But it gives me errors: Socket say send() errno:32 Broken pipe 127.0.0.1:27017
I tryied aggregate:
db.balance_sheets.aggregate([{ "9087n":{$gte:40}, "20/58n":{ $lte:40000000}} ,{$out:"pme"}])
And I get "exception: A pipeline stage specification object must contain exactly one field."
Any hints?
Thanks
The first option would be:
var cursor = db.balance_sheets.find({"9087n":{"$gte": 40}, "20/58n":{ $lte:40000000}});
while (cursor.hasNext()) {
var doc = cursor.next();
db.pme.save(doc);
};
As for the aggregation, try
db.balance_sheets.aggregate([
{
"$match": { "9087n": { "$gte": 40 }, "20/58n": { "$lte": 40000000 } }
},
{ "$out": "pme" }
]);
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2 to insert all the documents matching the query from the balance_sheets collection into the pme collection:
var bulk = db.pme.initializeUnorderedBulkOp(),
counter = 0;
db.balance_sheets.find({
"9087n": {"$gte": 40},
"20/58n":{ "$lte":40000000}
}).forEach(function (doc) {
bulk.insert(doc);
counter++;
if (counter % 500 == 0) {
bulk.execute(); // Execute per 500 operations
// and re-initialize every 1000 update statements
bulk = db.pme.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = db.balance_sheets.find({
"9087n": { "$gte": 40 },
"20/58n": { "$lte": 40000000 }
}).map(function (doc) {
return { "insertOne" : { "document": doc } };
});
db.pme.bulkWrite(bulkOps);

How to change date and time in MongoDB? [duplicate]

This question already has answers here:
Update MongoDB field using value of another field
(12 answers)
Closed 5 years ago.
I have the DB with names and dates. I need to change the old date with the date that is +3 days after that. For example oldaDate is 01.02.2015 the new one is 03.02.2015.
I was trying just to put another date for all files, but that mean that all exams are going to be in one day.
$ db.getCollection('school.exam').update( {}, { $set : { "oldDay" : new ISODate("2016-01-11T03:34:54Z") } }, true, true);
The problem is just to replace old date with some random days.
Since MongoDB doesn't yet support the $inc operator to apply on dates (view the JIRA ticket on that here), as an alternative to increment the date field, you would need to iterate the cursor returned by the find() method using the forEach() method, in the loop
get convert the old date field to timestamp, add the number of days in milliseconds to the timestamp and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by adding 3 days to the date field:
var bulk = db.getCollection("school.exam").initializeUnorderedBulkOp(),
counter = 0,
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "oldDay": incDate }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.getCollection('school.exam').initializeUnorderedBulkOp();
}
})
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [],
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "oldDay": incDate } }
}
}
);
})
db.getCollection("school.exam").bulkWrite(bulkOps, { 'ordered': true });