Converting string to date in mongodb - mongodb

Is there a way to convert string to date using custom format using mongodb shell
I am trying to convert "21/May/2012:16:35:33 -0400" to date,
Is there a way to pass DateFormatter or something to
Date.parse(...) or ISODate(....) method?

Using MongoDB 4.0 and newer
The $toDate operator will convert the value to a date. If the value cannot be converted to a date, $toDate errors. If the value is null or missing, $toDate returns null:
You can use it within an aggregate pipeline as follows:
db.collection.aggregate([
{ "$addFields": {
"created_at": {
"$toDate": "$created_at"
}
} }
])
The above is equivalent to using the $convert operator as follows:
db.collection.aggregate([
{ "$addFields": {
"created_at": {
"$convert": {
"input": "$created_at",
"to": "date"
}
}
} }
])
Using MongoDB 3.6 and newer
You cab also use the $dateFromString operator which converts the date/time string to a date object and has options for specifying the date format as well as the timezone:
db.collection.aggregate([
{ "$addFields": {
"created_at": {
"$dateFromString": {
"dateString": "$created_at",
"format": "%m-%d-%Y" /* <-- option available only in version 4.0. and newer */
}
}
} }
])
Using MongoDB versions >= 2.6 and < 3.2
If MongoDB version does not have the native operators that do the conversion, you would need to manually iterate the cursor returned by the find() method by either using the forEach() method
or the cursor method next() to access the documents. Withing the loop, convert the field to an ISODate object and then update the field using the $set operator, as in the following example where the field is called created_at and currently holds the date in string format:
var cursor = db.collection.find({"created_at": {"$exists": true, "$type": 2 }});
while (cursor.hasNext()) {
var doc = cursor.next();
db.collection.update(
{"_id" : doc._id},
{"$set" : {"created_at" : new ISODate(doc.created_at)}}
)
};
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing the created_at fields to date fields:
var bulk = db.collection.initializeUnorderedBulkOp(),
counter = 0;
db.collection.find({"created_at": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
var newDate = new ISODate(doc.created_at);
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "created_at": newDate}
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
Using MongoDB 3.2
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [],
cursor = db.collection.find({"created_at": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
var newDate = new ISODate(doc.created_at);
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "created_at": newDate } }
}
}
);
if (bulkOps.length === 500) {
db.collection.bulkWrite(bulkOps);
bulkOps = [];
}
});
if (bulkOps.length > 0) db.collection.bulkWrite(bulkOps);

In my case I have succeed with the following solution for converting field ClockInTime from ClockTime collection from string to Date type:
db.ClockTime.find().forEach(function(doc) {
doc.ClockInTime=new Date(doc.ClockInTime);
db.ClockTime.save(doc);
})

You can use the javascript in the second link provided by Ravi Khakhkhar or you are going to have to perform some string manipulation to convert your orginal string (as some of the special characters in your original format aren't being recognised as valid delimeters) but once you do that, you can use "new"
training:PRIMARY> Date()
Fri Jun 08 2012 13:53:03 GMT+0100 (IST)
training:PRIMARY> new Date()
ISODate("2012-06-08T12:53:06.831Z")
training:PRIMARY> var start = new Date("21/May/2012:16:35:33 -0400") => doesn't work
training:PRIMARY> start
ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
training:PRIMARY> var start = new Date("21 May 2012:16:35:33 -0400") => doesn't work
training:PRIMARY> start
ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
training:PRIMARY> var start = new Date("21 May 2012 16:35:33 -0400") => works
training:PRIMARY> start
ISODate("2012-05-21T20:35:33Z")
Here's some links that you may find useful (regarding modification of the data within the mongo shell) -
http://cookbook.mongodb.org/patterns/date_range/
http://www.mongodb.org/display/DOCS/Dates
http://www.mongodb.org/display/DOCS/Overview+-+The+MongoDB+Interactive+Shell

I had some strings in the MongoDB Stored wich had to be reformated to a proper and valid dateTime field in the mongodb.
here is my code for the special date format: "2014-03-12T09:14:19.5303017+01:00"
but you can easyly take this idea and write your own regex to parse the date formats:
// format: "2014-03-12T09:14:19.5303017+01:00"
var myregexp = /(....)-(..)-(..)T(..):(..):(..)\.(.+)([\+-])(..)/;
db.Product.find().forEach(function(doc) {
var matches = myregexp.exec(doc.metadata.insertTime);
if myregexp.test(doc.metadata.insertTime)) {
var offset = matches[9] * (matches[8] == "+" ? 1 : -1);
var hours = matches[4]-(-offset)+1
var date = new Date(matches[1], matches[2]-1, matches[3],hours, matches[5], matches[6], matches[7] / 10000.0)
db.Product.update({_id : doc._id}, {$set : {"metadata.insertTime" : date}})
print("succsessfully updated");
} else {
print("not updated");
}
})

How about using a library like momentjs by writing a script like this:
[install_moment.js]
function get_moment(){
// shim to get UMD module to load as CommonJS
var module = {exports:{}};
/*
copy your favorite UMD module (i.e. moment.js) here
*/
return module.exports
}
//load the module generator into the stored procedures:
db.system.js.save( {
_id:"get_moment",
value: get_moment,
});
Then load the script at the command line like so:
> mongo install_moment.js
Finally, in your next mongo session, use it like so:
// LOAD STORED PROCEDURES
db.loadServerScripts();
// GET THE MOMENT MODULE
var moment = get_moment();
// parse a date-time string
var a = moment("23 Feb 1997 at 3:23 pm","DD MMM YYYY [at] hh:mm a");
// reformat the string as you wish:
a.format("[The] DDD['th day of] YYYY"): //"The 54'th day of 1997"

Related

Mongodb Aggregation slow count using facet

I am wanting to use a facet to create a simple query that i can use to get paged data, however i have noticed that if i do this i get really poor performance when compared to running just two seperate queries.
As a quick test i created a collection with 50000 random documents and ran the following test.
var x = new Date();
var a = {
count : db.getCollection("test").find({}).count(),
data: db.getCollection("test").find({}).skip(0).limit(10)
};
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{
"$match" : {
}
},
{
"$facet" : {
"data": [
{
"$skip": 0
},
{
"$limit": 10
}
],
"pageInfo": [
{
"$group": {
"_id": null,
"count": {
"$sum": 1
}
}
}
]
}
}
]
)
var y = new Date();
print('result ' + a);
print(y - x);
The result of this is that two seperate queries one for find the other for count takes around 2 milliseconds vs the aggregation single query taking upwards of 500 milliseconds.
Why is it that the aggregation is so slow?
Update
Even just a count without a facet within an aggregation is slow
var x = new Date();
var a = db.getCollection("test").find({}).count();
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{ "$count" : "count" }
]
)
var y = new Date();
print('result ' + a);
print(y - x);
In the above with my test data set, the aggregation count takes 200ms vs the Count method taking 2ms.
This issue extends into the NodeJs Mongodb Driver where the .Count() method has been deprecated and replaced with a countDocuments() method, under the hood the new countDocuments() method is using an aggregation and not the count method on a find just like my example above it has significantly worse performance to the point at which i will continue using the deprecated method over the newer countDocuments() method.
Of course it is slow. The count() method just returns the cursor size after a query is applied (which does not necessarily require all documents to be read, depending on your query and indices). Furthermore, with an empty query, the query optimizer knows that all documents ought to be returned and basically only has to return length(_id_1).
Aggregations, by definition, do not work that way. Unless there is a match stage actually ruling out a document, each and every document is read from “disk” (MongoDB’s own cache and FS caches aside for the moment) for further processing.
I am running into the same issue, and I just hope that anyone might have a better answer then what was previously posted.
I have a "user" collection with 12 million users in it, using MongoDB 5.0.
My query looks like this:
db.users.aggregate([
{ '$sort': { updated_at: -1 } },
{ '$facet': {
results: [
{ $skip: 0 },
{ $limit: 20 }
],
total: [
{ $count: 'count' }
]
}
}
])
The query takes around 1 minute, so that is not acceptable.
I have an index on "updated_at", that is not the issue.
Also, I have this issue even if I run it directly on MongoShell in Compass. So it is not related to any NodeJs Mongo Driver as was previously suspected.
Can I somehow tell Mongo to use the estimated count here?
Or is there any other way to improve the query?

How to get ISO string in Nifi getMongo Query Field

I'm trying to use expression languge to generate ISO string in Nifi getMongo Query field using following query,
{
"remindmeDate": {
"$gte": "${now():format("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",'GMT')}",
"$lte": "${now():toNumber():plus(359999):format("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",'GMT')}"
}
}
But i'm getting invalid JSON error error as double quotes are not escaped. When we try to escape it using \ operator, nifi is not evaluating the expression language. Is there any method or workaround to get this working ?
Thanks in advance
GetMongo processor of nifi requires your query to be in extended json format of mongo.So you can use query of below format to query mongo based on datetime:
{"bday":{"$gt":{"$date":"2014-01-01T05:00:00.000Z"}, "$lt" :{"$date":"2019-01-
01T05:00:00.000Z"}}}
I used your not changed expression in UpdateAttribute processor to evaluate new flowFile attribute.
your expression:
{
"remindmeDate": {
"$gte": "${now():format("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",'GMT')}",
"$lte": "${now():toNumber():plus(359999):format("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",'GMT')}"
}
}
the result:
{
"remindmeDate": {
"$gte": "2017-06-16T07:38:04.811Z",
"$lte": "2017-06-16T07:44:04.810Z"
}
}
and this is a correct json object.
Finally I found that GetMongo.Query property does not support nifi expression language (nifi 1.2.0 and 1.3.0). Just hover the question mark near parameter.
It means no way to build dynamic query (
Seems need to register an issue... https://issues.apache.org/jira/browse/NIFI-4082
But it's possible to specify current and relative date in mongo query language. something like this:
{
"remindmeDate": {
"$gte": new Date(),
"$lte": new Date(ISODate().getTime() + 359999)
}
}
Nifi's getMongo Query field doesnt support EL. So i created a stored function in MongoDB for my dynamic query and called it from Nifi.
{
"_id" : "reminderDateGMT",
"value" : function (reminderDateGMT) {
var reminder = new Date(reminderDateGMT)
var fromDate = new Date();
var toDate = new Date(new Date().getTime()+(1000 * 60 * 60));
if ((reminder >= fromDate) && (reminder <=toDate )) {
return true;
} else {
return false;
}
}
}
In nifi GetMongo Query,
{
"$where": "reminderDateGMT(this.reminderDateGMT)"
}
I think you may be able to use the unescapeJson expression language function to handle this. You have to provide valid JSON (escaped quotes) for the field level (PropertyDescriptor in NiFi parlance) validation, but the expression language string expects unescaped JSON during expression parsing, so the unescapeJson function removes the escapes first and then format receives a properly quoted string.
{
"remindmeDate": {
"$gte": "${now():format(\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\":unescapeJson(),'GMT')}",
"$lte": "${now():toNumber():plus(359999):format(\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\":unescapeJson(),'GMT')}"
}
}
I had a similar discussion on the mailing list, and here is the solution I found that works:
Mongo console:
db.system.js.save({
"_id": "lastFiveMinutes",
"value": function() {
return new Date(ISODate().getTime() - (1000 * 60 * 5));
}
});
db.loadServerScripts();
Query field:
{
"$where": "obj.ts >= lastFiveMinutes()"
}
Note: you probably want to set this on a timer in the scheduling property.
I know this is pretty old post, but I spent lot many hours and found a solution which worked for me.
Use UpdateAttribute Processor and created two attributes to calculate the date range, I need to fetch the mongo documents :
startDate: "${now():format('yyyy-MM-dd')}"
endDate : "${now():toNumber():plus(86400000):format('yyyy-MM-dd')}"
enter image description here
After that pass these attributes to GetMongo processor:
Query : {"createdDate":{"$gte":ISODate(${startDate}), "$lt":ISODate(${endDate})}}

How to get pointer of current document to update in updateMany

I have a latest mongodb 3.2 and there is a collection of many items that have timeStamp.
A need to convert milliseconds to Date object and now I use this function:
db.myColl.find().forEach(function (doc) {
doc.date = new Date(doc.date);
db.myColl.save(doc);
})
It took very long time to update 2 millions of rows.
I try to use updateMany (seems it is very fast) but how I can get access to a current document? Is there any chance to rewrite the query above by using updateMany?
Thank you.
You can leverage other bulk update APIs like the bulkWrite() method which will allow you to use an iterator to access a document, manipulate it, add the modified document to a list and then send the list of the update operations in a batch to the server for execution.
The following demonstrates this approach, in which you would use the cursor's forEach() method to iterate the colloction and modify the each document at the same time pushing the update operation to a batch of about 1000 documents which can then be updated at once using the bulkWrite() method.
This is as efficient as using the updateMany() since it uses the same underlying bulk write operations:
var cursor = db.myColl.find({"date": { "$exists": true, "$type": 1 }}),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newDate = new Date(doc.date);
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "date": newDate } }
}
});
if (bulkUpdateOps.length == 1000) {
db.myColl.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.myColl.bulkWrite(bulkUpdateOps); }
Current query is the only one solution to set field value by itself or other field value (one could compute some data using more than one field from document).
There is a way to improve performance of that query - when it is executed vis mongo shell directly on server (no data is passed to client).

How to change date and time in MongoDB? [duplicate]

This question already has answers here:
Update MongoDB field using value of another field
(12 answers)
Closed 5 years ago.
I have the DB with names and dates. I need to change the old date with the date that is +3 days after that. For example oldaDate is 01.02.2015 the new one is 03.02.2015.
I was trying just to put another date for all files, but that mean that all exams are going to be in one day.
$ db.getCollection('school.exam').update( {}, { $set : { "oldDay" : new ISODate("2016-01-11T03:34:54Z") } }, true, true);
The problem is just to replace old date with some random days.
Since MongoDB doesn't yet support the $inc operator to apply on dates (view the JIRA ticket on that here), as an alternative to increment the date field, you would need to iterate the cursor returned by the find() method using the forEach() method, in the loop
get convert the old date field to timestamp, add the number of days in milliseconds to the timestamp and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by adding 3 days to the date field:
var bulk = db.getCollection("school.exam").initializeUnorderedBulkOp(),
counter = 0,
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "oldDay": incDate }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.getCollection('school.exam').initializeUnorderedBulkOp();
}
})
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [],
daysInMilliSeconds = 86400000,
numOfDays = 3;
db.getCollection("school.exam").find({ "oldDay": { $exists : true, "$type": 2 }}).forEach(function (doc) {
var incDate = new Date(doc.oldDay.getTime() + (numOfDays * daysInMilliSeconds ));
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "oldDay": incDate } }
}
}
);
})
db.getCollection("school.exam").bulkWrite(bulkOps, { 'ordered': true });

Aggregate MongoDB results by ObjectId date

How can I aggregate my MongoDB results by ObjectId date. Example:
Default cursor results:
cursor = [
{'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
{'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'},
{'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'},
]
Projected results:
projected_cursor = [
{'2013-09-08':
{'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
{'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'}
},
{'2013-09-07':
{'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'}
}
]
This is what I'm currently using in PyMongo to achieve these results, but it's messy and I'd like to see how I can do it using MongoDB's aggregation framework (or even MapReduce):
cursor = db.find({}, limit=10).sort("_id", pymongo.DESCENDING)
messages = [x for x in cursor]
this_date = lambda x: x['_id'].generation_time.date()
dates = set([this_date(message) for message in messages])
dates_dict = {date: [m for m in messages if this_date(m) == date] for date in dates}
And yes, I know that the easiest way would be to simply add a new date field to each record then aggregate by that, but that's not what I want to do right now.
Thanks!
Update: There is a built in way to do this now, see https://stackoverflow.com/a/51766657/295687
There is no way to accomplish what you're asking with mongodb's
aggregation framework, because there is no aggregation operator that
can turn ObjectId's into something date-like (there is a JIRA
ticket, though). You
should be able to accomplish what you want using map-reduce, however:
// map function
function domap() {
// turn ObjectId --> ISODate
var date = this._id.getTimestamp();
// format the date however you want
var year = date.getFullYear();
var month = date.getMonth();
var day = date.getDate();
// yields date string as key, entire document as value
emit(year+"-"+month+"-"+day, this);
}
// reduce function
function doreduce(datestring, docs) {
return {"date":datestring, "docs":docs};
}
The Jira Ticket pointed out by llovett has been solved, so now you can use date operators like $isoWeek and $year to extract this information from an ObjectId.
Your aggregation would look something like this:
{
"$project":
{
"_id": {
"$dateFromParts" : {
"year": { "$year": "$_id"},
"month": { "$month": "$_id"},
"day": { "$dayOfMonth": "$_id"}
}
}
}
}
So this doesn't answer my question directly, but I did find a better way to replace all that lambda nonsense above using Python's setdefault:
d = {}
for message in messages:
key = message['_id'].generation_time.date()
d.setdefault(key,[]).append(message)
Thanks to #raymondh for the hint in is PyCon talk:
Transforming Code into Beautiful, Idiomatic Python