How to change date and time in MongoDB? [duplicate] - mongodb

This question already has answers here:
Update MongoDB field using value of another field
(12 answers)
Closed 5 years ago.
I have the DB with names and dates. I need to change the old date with the date that is +3 days after that. For example oldaDate is 01.02.2015 the new one is 03.02.2015.
I was trying just to put another date for all files, but that mean that all exams are going to be in one day.
$ db.getCollection('school.exam').update( {}, { $set : { "oldDay" : new ISODate("2016-01-11T03:34:54Z") } }, true, true);
The problem is just to replace old date with some random days.

Since MongoDB doesn't yet support the $inc operator to apply on dates (view the JIRA ticket on that here), as an alternative to increment the date field, you would need to iterate the cursor returned by the find() method using the forEach() method, in the loop
get convert the old date field to timestamp, add the number of days in milliseconds to the timestamp and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by adding 3 days to the date field:
var bulk = db.getCollection("school.exam").initializeUnorderedBulkOp(),
counter = 0,
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "oldDay": incDate }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.getCollection('school.exam').initializeUnorderedBulkOp();
}
})
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [],
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "oldDay": incDate } }
}
}
);
})
db.getCollection("school.exam").bulkWrite(bulkOps, { 'ordered': true });

Related

Mongodb Aggregation slow count using facet

I am wanting to use a facet to create a simple query that i can use to get paged data, however i have noticed that if i do this i get really poor performance when compared to running just two seperate queries.
As a quick test i created a collection with 50000 random documents and ran the following test.
var x = new Date();
var a = {
count : db.getCollection("test").find({}).count(),
data: db.getCollection("test").find({}).skip(0).limit(10)
};
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{
"$match" : {
}
},
{
"$facet" : {
"data": [
{
"$skip": 0
},
{
"$limit": 10
}
],
"pageInfo": [
{
"$group": {
"_id": null,
"count": {
"$sum": 1
}
}
}
]
}
}
]
)
var y = new Date();
print('result ' + a);
print(y - x);
The result of this is that two seperate queries one for find the other for count takes around 2 milliseconds vs the aggregation single query taking upwards of 500 milliseconds.
Why is it that the aggregation is so slow?
Update
Even just a count without a facet within an aggregation is slow
var x = new Date();
var a = db.getCollection("test").find({}).count();
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{ "$count" : "count" }
]
)
var y = new Date();
print('result ' + a);
print(y - x);
In the above with my test data set, the aggregation count takes 200ms vs the Count method taking 2ms.
This issue extends into the NodeJs Mongodb Driver where the .Count() method has been deprecated and replaced with a countDocuments() method, under the hood the new countDocuments() method is using an aggregation and not the count method on a find just like my example above it has significantly worse performance to the point at which i will continue using the deprecated method over the newer countDocuments() method.
Of course it is slow. The count() method just returns the cursor size after a query is applied (which does not necessarily require all documents to be read, depending on your query and indices). Furthermore, with an empty query, the query optimizer knows that all documents ought to be returned and basically only has to return length(_id_1).
Aggregations, by definition, do not work that way. Unless there is a match stage actually ruling out a document, each and every document is read from “disk” (MongoDB’s own cache and FS caches aside for the moment) for further processing.
I am running into the same issue, and I just hope that anyone might have a better answer then what was previously posted.
I have a "user" collection with 12 million users in it, using MongoDB 5.0.
My query looks like this:
db.users.aggregate([
{ '$sort': { updated_at: -1 } },
{ '$facet': {
results: [
{ $skip: 0 },
{ $limit: 20 }
],
total: [
{ $count: 'count' }
]
}
}
])
The query takes around 1 minute, so that is not acceptable.
I have an index on "updated_at", that is not the issue.
Also, I have this issue even if I run it directly on MongoShell in Compass. So it is not related to any NodeJs Mongo Driver as was previously suspected.
Can I somehow tell Mongo to use the estimated count here?
Or is there any other way to improve the query?

Compare Time to Document Interval and Update

Use Case:
I've got a mongodb collection with a couple million documents. Documents in this
collection must be updated sometimes. Therefore I've setup a monitorFrequency field which would define the that a specific document must be updated every 6, 12, 24 or 720 hours. Additionally I setup a field called lastRefreshAt which is a timestamp of the last actual update.
The problem:
How can I select all documents from my collection profiles which need to be refreshed again (because monitorFrequency is older than lastRefreshAt).
Should I run that on a single query which would only return those documents which need to be refreshed again or should I rather iterate on all documents with a cursor and check in my node application if the document needs to be refreshed or not?
I would know how to do approach #2, but I am not sure what approach to chose and how the query for #1 would look like.
There are a couple of approaches depending on available architecture and choices. Some are good choices and some are bad, but we might as well explain them all.
Use $where with multi-update
As a first option to examine, you could use $where to calculate the difference for selection and feed directly to .update() or .updateMany() for that matter:
db.profiles.update(
{
"$where": function() {
return (Date.now() - this.lastRefreshAt.valueOf())
> ( this.monitorFrequency * 1000 * 60 * 60 );
}
},
{ "$currentDate": { "lastRefreshAt": true } },
{ "multi": true }
)
Which pretty simply works out the milliseconds difference between the current "lastRefreshAt" value and the current Date value and compares that to the stored "monitorFrequency" converted into milliseconds itself.
The $currentDate is appplied because it is a "multi" update and applied to all matched documents, so this ensures the "server timestamp" at the actual time of document update is applied to the document.
It's not fantastic as it does require a full collection scan in order to select the documents via calculation and thus cannot use an index. Plus it's JavaScript evaluation, which not being native code does add some overhead.
Loop the matched selection
So JavaScript is not that great a selection option in general when other options apply. Instead try using the aggregation framework for the calculation and loop the cursor result:
var ops = [];
db.profiles.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$subtract": [new Date(), "$lastRefreshAt"] },
{ "$multiply": ["$monitorFrequency", 1000 * 60 * 60] }
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
]).forEach(doc => {
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$currentDate": { "lastRefreshAt": true } }
}
});
if ( ops.length > 1000 ) {
db.profiles.bulkWrite(ops);
ops = [];
}
})
if ( ops.length > 0 ) {
db.profiles.bulkWrite(ops);
ops = [];
}
So again that's a collection scan due to the calculation but it is done with native operators, so that part at least should be a bit faster. Also from a technical standpoint it's a little different because the new Date() is actually established at the time of request and not per document iterated as it would be using $where. Lacking an operator to produce the "current date" internally, there is no way for the aggregation framework to do this per iteration.
And of course, instead of just applying our "update" expression as it matches documents, we are looping the result cursor and applying a function. So whilst there are "some" gains, there is also additional overhead. Mileage may vary as to performance and practicality.
Parallel Updates
Personally I would do neither of the above and simply run a query selecting each marked "monitorFrequency" and looking for the dates between the boundaries that exceed the allowed difference.
As a simple example using NodeJS to implement Promise.all() for parallel calls:
const MongoClient = require('mongodb').MongoClient;
const onHour = 1000 * 60 * 60;
(async function() {
let db;
try {
db = await MongoClient.connect('mongodb://localhost/test');
let collection = db.collection('profiles');
let intervals = [6, 12, 24, 720];
let snapDate = new Date();
await Promise.all(
intervals.map( (monitorFrequency,i) =>
collection.updateMany(
{
monitorFrequency,
"lastRefreshAt": Object.assign(
{ "$lt": new Date(snapDate.valueOf() - intervals[i] * oneHour) },
(i < intervals.length) ?
{ "$gt": new Date(snapDate.valueOf() - intervals[i+1] * oneHour) }
: {}
)
},
{ "$currentDate": { "lastRefreshAt": true } },
)
)
);
} catch(e) {
console.error(e);
} finally {
db.close();
}
})();
This would allow you to index on the two fields and allow optimal selection, and since the "date ranges" are paired to their calculated difference from "monitorFrequency" then those documents that "require refresh" are the only ones that get selected for update.
Gievn the finite number of possible intervals this is what I would suspect to be the most optimal solution. But the construction along with the fact that the actual "update" portion remains consistent for each selection leads to one other option.
Use $or for each selection.
Much the same logic as above, but instead applied to build an $or condition for the "query" portion of a "single" update. It is an "array of criteria" afterall, which is essentially the same as an "array of queries" which is what we are doing above. So just turn it around a little:
let intervals = [6, 12, 24, 720];
let snapDate = new Date();
db.profiles.updateMany(
{
"$or": intervals.map( (monitorFrequency,i) =>
({
monitorFrequency,
"lastRefreshAt": Object.assign(
{ "$lt": new Date(snapDate.valueOf() - intervals[i] * oneHour) },
(i < intervals.length) ?
{ "$gt": new Date(snapDate.valueOf() - intervals[i+1] * oneHour) }
: {}
)
})
)
},
{ "$currentDate": { "lastRefreshAt": true } }
)
This then becomes one simple statement and of course can actually use indexes where available. Generally this is what you should be doing, though as I have suggested my intuition tells me that 4 threads of execution constrained only by the slowest one gets the job done slightly faster. Again, mileage may vary on that but logic dictates that this is so.
So the basic lesson here is "whilst you may think" that the logical approach is to calculate the values and compare within the database itself, it's actually the worst possible thing you can do for query performance.
The simple approach taken are to work out the criteria that should select the documents you want "before" you issue the query statement to the server. This means you are looking at "concrete values" rather than "calculation results" in comparison. And "concrete values" can actually be indexed, which is generally what you want for database queries.

How to get pointer of current document to update in updateMany

I have a latest mongodb 3.2 and there is a collection of many items that have timeStamp.
A need to convert milliseconds to Date object and now I use this function:
db.myColl.find().forEach(function (doc) {
doc.date = new Date(doc.date);
db.myColl.save(doc);
})
It took very long time to update 2 millions of rows.
I try to use updateMany (seems it is very fast) but how I can get access to a current document? Is there any chance to rewrite the query above by using updateMany?
Thank you.
You can leverage other bulk update APIs like the bulkWrite() method which will allow you to use an iterator to access a document, manipulate it, add the modified document to a list and then send the list of the update operations in a batch to the server for execution.
The following demonstrates this approach, in which you would use the cursor's forEach() method to iterate the colloction and modify the each document at the same time pushing the update operation to a batch of about 1000 documents which can then be updated at once using the bulkWrite() method.
This is as efficient as using the updateMany() since it uses the same underlying bulk write operations:
var cursor = db.myColl.find({"date": { "$exists": true, "$type": 1 }}),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newDate = new Date(doc.date);
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "date": newDate } }
}
});
if (bulkUpdateOps.length == 1000) {
db.myColl.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.myColl.bulkWrite(bulkUpdateOps); }
Current query is the only one solution to set field value by itself or other field value (one could compute some data using more than one field from document).
There is a way to improve performance of that query - when it is executed vis mongo shell directly on server (no data is passed to client).

Making a collection from collection subset in mongodb

I have a huge collection of documents (more than two millions) ans I found my self querying very a small subset. using something like
scs = db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}})
which gives less than 5k results. The question is, can I create a new collection with the results of this query?
I'd tried insert:
db.scs.insert(db.balance_sheets.find({"9087n":{$gte:40}, "20/58n":{ $lte:40000000}}).toArray())
But it gives me errors: Socket say send() errno:32 Broken pipe 127.0.0.1:27017
I tryied aggregate:
db.balance_sheets.aggregate([{ "9087n":{$gte:40}, "20/58n":{ $lte:40000000}} ,{$out:"pme"}])
And I get "exception: A pipeline stage specification object must contain exactly one field."
Any hints?
Thanks
The first option would be:
var cursor = db.balance_sheets.find({"9087n":{"$gte": 40}, "20/58n":{ $lte:40000000}});
while (cursor.hasNext()) {
var doc = cursor.next();
db.pme.save(doc);
};
As for the aggregation, try
db.balance_sheets.aggregate([
{
"$match": { "9087n": { "$gte": 40 }, "20/58n": { "$lte": 40000000 } }
},
{ "$out": "pme" }
]);
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2 to insert all the documents matching the query from the balance_sheets collection into the pme collection:
var bulk = db.pme.initializeUnorderedBulkOp(),
counter = 0;
db.balance_sheets.find({
"9087n": {"$gte": 40},
"20/58n":{ "$lte":40000000}
}).forEach(function (doc) {
bulk.insert(doc);
counter++;
if (counter % 500 == 0) {
bulk.execute(); // Execute per 500 operations
// and re-initialize every 1000 update statements
bulk = db.pme.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = db.balance_sheets.find({
"9087n": { "$gte": 40 },
"20/58n": { "$lte": 40000000 }
}).map(function (doc) {
return { "insertOne" : { "document": doc } };
});
db.pme.bulkWrite(bulkOps);

Converting string to date in mongodb

Is there a way to convert string to date using custom format using mongodb shell
I am trying to convert "21/May/2012:16:35:33 -0400" to date,
Is there a way to pass DateFormatter or something to
Date.parse(...) or ISODate(....) method?
Using MongoDB 4.0 and newer
The $toDate operator will convert the value to a date. If the value cannot be converted to a date, $toDate errors. If the value is null or missing, $toDate returns null:
You can use it within an aggregate pipeline as follows:
db.collection.aggregate([
{ "$addFields": {
"created_at": {
"$toDate": "$created_at"
}
} }
])
The above is equivalent to using the $convert operator as follows:
db.collection.aggregate([
{ "$addFields": {
"created_at": {
"$convert": {
"input": "$created_at",
"to": "date"
}
}
} }
])
Using MongoDB 3.6 and newer
You cab also use the $dateFromString operator which converts the date/time string to a date object and has options for specifying the date format as well as the timezone:
db.collection.aggregate([
{ "$addFields": {
"created_at": {
"$dateFromString": {
"dateString": "$created_at",
"format": "%m-%d-%Y" /* <-- option available only in version 4.0. and newer */
}
}
} }
])
Using MongoDB versions >= 2.6 and < 3.2
If MongoDB version does not have the native operators that do the conversion, you would need to manually iterate the cursor returned by the find() method by either using the forEach() method
or the cursor method next() to access the documents. Withing the loop, convert the field to an ISODate object and then update the field using the $set operator, as in the following example where the field is called created_at and currently holds the date in string format:
var cursor = db.collection.find({"created_at": {"$exists": true, "$type": 2 }});
while (cursor.hasNext()) {
var doc = cursor.next();
db.collection.update(
{"_id" : doc._id},
{"$set" : {"created_at" : new ISODate(doc.created_at)}}
)
};
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing the created_at fields to date fields:
var bulk = db.collection.initializeUnorderedBulkOp(),
counter = 0;
db.collection.find({"created_at": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
var newDate = new ISODate(doc.created_at);
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "created_at": newDate}
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
Using MongoDB 3.2
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [],
cursor = db.collection.find({"created_at": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
var newDate = new ISODate(doc.created_at);
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "created_at": newDate } }
}
}
);
if (bulkOps.length === 500) {
db.collection.bulkWrite(bulkOps);
bulkOps = [];
}
});
if (bulkOps.length > 0) db.collection.bulkWrite(bulkOps);
In my case I have succeed with the following solution for converting field ClockInTime from ClockTime collection from string to Date type:
db.ClockTime.find().forEach(function(doc) {
doc.ClockInTime=new Date(doc.ClockInTime);
db.ClockTime.save(doc);
})
You can use the javascript in the second link provided by Ravi Khakhkhar or you are going to have to perform some string manipulation to convert your orginal string (as some of the special characters in your original format aren't being recognised as valid delimeters) but once you do that, you can use "new"
training:PRIMARY> Date()
Fri Jun 08 2012 13:53:03 GMT+0100 (IST)
training:PRIMARY> new Date()
ISODate("2012-06-08T12:53:06.831Z")
training:PRIMARY> var start = new Date("21/May/2012:16:35:33 -0400") => doesn't work
training:PRIMARY> start
ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
training:PRIMARY> var start = new Date("21 May 2012:16:35:33 -0400") => doesn't work
training:PRIMARY> start
ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
training:PRIMARY> var start = new Date("21 May 2012 16:35:33 -0400") => works
training:PRIMARY> start
ISODate("2012-05-21T20:35:33Z")
Here's some links that you may find useful (regarding modification of the data within the mongo shell) -
http://cookbook.mongodb.org/patterns/date_range/
http://www.mongodb.org/display/DOCS/Dates
http://www.mongodb.org/display/DOCS/Overview+-+The+MongoDB+Interactive+Shell
I had some strings in the MongoDB Stored wich had to be reformated to a proper and valid dateTime field in the mongodb.
here is my code for the special date format: "2014-03-12T09:14:19.5303017+01:00"
but you can easyly take this idea and write your own regex to parse the date formats:
// format: "2014-03-12T09:14:19.5303017+01:00"
var myregexp = /(....)-(..)-(..)T(..):(..):(..)\.(.+)([\+-])(..)/;
db.Product.find().forEach(function(doc) {
var matches = myregexp.exec(doc.metadata.insertTime);
if myregexp.test(doc.metadata.insertTime)) {
var offset = matches[9] * (matches[8] == "+" ? 1 : -1);
var hours = matches[4]-(-offset)+1
var date = new Date(matches[1], matches[2]-1, matches[3],hours, matches[5], matches[6], matches[7] / 10000.0)
db.Product.update({_id : doc._id}, {$set : {"metadata.insertTime" : date}})
print("succsessfully updated");
} else {
print("not updated");
}
})
How about using a library like momentjs by writing a script like this:
[install_moment.js]
function get_moment(){
// shim to get UMD module to load as CommonJS
var module = {exports:{}};
/*
copy your favorite UMD module (i.e. moment.js) here
*/
return module.exports
}
//load the module generator into the stored procedures:
db.system.js.save( {
_id:"get_moment",
value: get_moment,
});
Then load the script at the command line like so:
> mongo install_moment.js
Finally, in your next mongo session, use it like so:
// LOAD STORED PROCEDURES
db.loadServerScripts();
// GET THE MOMENT MODULE
var moment = get_moment();
// parse a date-time string
var a = moment("23 Feb 1997 at 3:23 pm","DD MMM YYYY [at] hh:mm a");
// reformat the string as you wish:
a.format("[The] DDD['th day of] YYYY"): //"The 54'th day of 1997"