Should I use the timestamp in "_id"? - mongodb

I need monitor the time of the records been created, for further query and modify.
first thing flashed in my mind is give the document a "createDateTime" field, with the default value of "new Date()", but Mongodb said the document _id has a timestamp embedded with, and the id was generated when the document was created, so it sounds dummy to add a new field for that.
for too many times, I've seen people set a "createDateTime" for their data, and I don't know if they know about the details of mongodb's _id.
I want know should I use the _id as a "createDateTime" field? what is the best practice?
and the pros and cons.
thanks for any tips.

I'd actually say it depends on how you want to use the date.
For example, it's not actionable using the aggregation framework Date operators.
This will fail for example:
db.test.aggregate( { $group : { _id: { $year: "$_id" } } })
The following error occurs:
"errmsg" : "exception: can't convert from BSON type OID to Date"
(The date cannot be extracted from the ObjectId.)
So, operations that are normally simple date operations become much more complex if you wanted to do any sort of date math in an aggregation. It would be far easier to have a createDateTime stamp. Counting the number of documents created in a particular year and month would be simple using aggregation with a distinct createdDateTime field.
You can sort on an ObjectId, to some degree. The remaining 8 bytes of the ObjectId aren't sortable in a meaningful way. Most MongoDB drivers default to creating the ObjectId within the driver and not on the database. So, if you've got multiple clients (like web servers for example) creating new documents (and new ObjectIds), the time stamps will only be as accurate as the various servers.
Also, depending the precision you'd need, an ISODate value is stored using 8 bytes, rather than the 4 used in an ObjectId.

Yes, you should. There is no reason not to do, besides the human readability while directly looking into the database. See also here and here.
If you want to use the aggregation framework to group by the date within _id, this is not possible yet as WiredPrairie correctly said. There is an open jira ticket for that, you might watch. But of course you can do this with Map-Reduce and ObjectID.getTimestamp(). An example for that can be found here.

Related

Why do we need created_at in MongoDB

Why do we when the created_at field when the timestamp can be found in the first 4 bytes of the ObjectId
ObjectId("5349b4ddd2781d08c09890f4").getTimestamp()
Taken from MongoDB Docs
There are several cases where it makes sense to do so:
When you need better precision - ObjectId.getTimestamp() is precise up to seconds, while Date fields store milliseconds. Compare this in mongo shell: new Date() yields ISODate("2016-01-03T21:21:38.032Z"), while ObjectId().getTimestamp() yields ISODate("2016-01-03T21:21:50Z").
When you are not using ObjectId at all - it is often taken as a given that _id field should be populated with ObjectId, while in fact ObjectId is only a default used by most of the drivers and MongoDB itself doesn't impose it - on the contrary, it is encouraged to use any "natural" unique ID if it exists for the documents. In this case though you will have to store "creation timestamp" yourself if you need it.
Usability - if you rely on the presence of this field and the data in it, it might be better, at least from design standpoint, to be explicit about it. This is more a matter of taste though. However, as noted in comments, if you also want to filter or sort by "creation timestamp" - it will be easier to do having a dedicated field for it and using query operators like $gt, for example, directly on it.
As you said, like it states clearly in the documentation:
Since the _id ObjectId by default stores the 4 byte timestamp, in most cases you do not need to store the creation time of any document.
And you may use ObjectId("5349b4ddd2781d08c09890f4").getTimestamp() in order to get the creation date in ISO date format.
It is also a matter of convenience to the costumer (us) to have a service like that, as it makes our attempt of getting the creating date and performing actions on it much more intuitive and easy.

Sort collection by insertion datetime using only id field

I have a collection of data and I want to get it sorted by insertion time. I have not any additional fields to store the insert time. But as I found out I can get this time from Id.
I have tried this code:
return bookmarks.find({}, {sort: {_id.getTimestamp(): 1}, limit: 10});
or
return bookmarks.find({}, {sort: {ObjectId(_id).getTimestamp(): 1}, limit: 10});
but get the error message:
=> Your application has errors. Waiting for file change.
Is there any way to sort collection by insertion datetime using only id field ?
At the moment this isn't possible with Meteor, even if it is with MongoDB. The ObjectID's created with meteor don't bear a timestamp. See http://docs.meteor.com/#collection_object_id
The reason for this is client side code can insert code and it can arrive late on the server, hence there is no guarantee the timestamp portion of the ObjectID will be accurate. In addition to the latency the client side's date is used meaning if they're off it's going to get you incorrect data. I think this is the reason they use an ObjectID but it is completely random.
If you want to sort by date you have to store the time/date separately.
The part what i striked out is not accurate. Meteor use it is own id generation which is based on a random string that is while does not apply the doc what i linked before. Check sasha.sochka's comment under.
It is nearly but not 100% good if you just sort for the _id field . While as it is constructed the first 4 byte is the timestamp in secs (so sorting for the getTimestamps value is not better). Under one second resolution you cannot get the exact order, as it is mentioned in the documentation: http://docs.mongodb.org/manual/reference/object-id/#objectid
It is still true that you can try to check the exact order of the insert/update ops against your collection in the oplog, if you have an oplog, but as it is a capped collection anyway you will see the recent operations only. http://docs.mongodb.org/manual/core/replica-set-oplog/.

Generation of _id vs. ObjectId autogeneration in MongoDB

I'm developing an application that create permalinks. I'm not sure how save the documents in MondoDB. Two strategies:
ObjectId autogeneration
MongoDB autogenerates the _id. I need to create an index on the permalink field because I get the information by the permalink. Also I can access to the creation time of the ObjectId, using the getTimestamp() method, so datetime fields seems to be redundant but if I delete this field I need two calls to MongoDB one to take the information and another to take the timestamp.
{
"_id": ObjectId("5210a64f846cb004b5000001"),
"permalink": "ca8W7mc0ZUx43bxTuSGN",
"data": "a lot of stuff",
"datetime": ISODate("2013-08-18T11:47:43.460+-100")
}
Generate _id
I generate the _id with the permalink.
{
"_id": "ca8W7mc0ZUx43bxTuSGN",
"data": "a lot of stuff",
"datetime": ISODate("2013-08-18T11:47:43.460+-100")
}
I not see any advantage to use ObjectIds. Am I missing something?
ObjectIds are there for situations where you don't have a unique key for every document in a collection. They're unique, so you don't have to worry about conflicts and they shard reasonably well in large deployments without too much worry (they have they're pros and cons, read more here).
The ObjectId also contains the timestamp of the client where the ObjectId was generated (unless the DB server is configured to generate all keys). With that, as you noticed, you can use the time stamp to perform some date operations. However, if you plan on using the Aggregation Framework, you'll find that you can't use an ObjectId in any date operations currently (issue). If you want to use the AF, you'll need a second field that contains the date, unfortunately doubly storing it with the ObjectId's internal value.
If you can be assured that the _id you're generating is unique, then there's not much reason to use an ObjectId in your data structure.

MongoDB - most efficient way of getting the last version of a document

I'm using MongoDB to hold a collection of documents.
Each document has an _id (version) which is an ObjectId. Each document has a documentId which is shared across the different versions. This too is an OjectId assigned when the first document was created.
What's the most efficient way of finding the most up-to-date version of a document given the documentId?
I.e. I want to get the record where _id = max(_id) and documentId = x
Do I need to use MapReduce?
Thanks in advance,
Sam
Add index containing both fields (documentId, _id) and don't use max (what for)? Use query with documentId = x, order DESC by _id and limit(1) results to get the latest. Remember about proper sorting order of index (DESC also)
Something like that
db.collection.find({documentId : "x"}).sort({_id : -1}).limit(1)
Other approach (more denormalized) would be to use other collecion with documents like:
{
documentId : "x",
latestVersionId : ...
}
Use of atomic operations would allow to safely update this collection. Adding proper index would make queries fast as lightning.
There is one thing to take into account - i'm not sure whether ObjectID can always be safely used to order by for latest version. Using timestamp may be more certain approach.
I was typing the same as Daimon's first answer, using sort and limit. This is probably not recommended, especially with some drivers (which use random numbers instead of increments for the least significant portion), because of the way the _id is generated. It has second [as opposed to something smaller, like millisecond] resolution as the most significant portion, but the last number can be a random number. So if you had a user save twice in a second (probably not likely, but worth noting), you might end up with a slightly out of order latest document.
See http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification for more details on the structure of the ObjectID.
I would recommend adding an explicit versionNumber field to your documents, so you can query in a similar fashion using that field, like so:
db.coll.find({documentId: <id>}).sort({versionNum: -1}).limit(1);
edit to answer question in comments
You can store a regular DateTime directly in MongoDB, but it will only store the milliseconds precision in a "DateTime" format in MongoDB. If that's good enough, it's simpler to do.
BsonDocument doc = new BsonDocument("dt", DateTime.UtcNow);
coll.Insert (doc);
doc = coll.FindOne();
// see it doesn't have precision...
Console.WriteLine(doc.GetValue("dt").AsUniversalTime.Ticks);
If you want .NET DateTime (ticks)/Timestamp precision, you can do a bunch of casts to get it to work, like:
BsonDocument doc = new BsonDocument("dt", new BsonTimestamp(DateTime.UtcNow.Ticks));
coll.Insert (doc);
doc = coll.FindOne();
// see it does have precision
Console.WriteLine(new DateTime(doc.GetValue("dt").AsBsonTimestamp.Value).Ticks);
update again!
Looks like the real use for BsonTimestamp is to generate unique timestamps within a second resolution. So, you're not really supposed to abuse them as I have in the last few lines of code, and it actually will probably screw up the ordering of results. If you need to store the DateTime at a Tick (100 nanosecond) resolution, you probably should just store the 64-bit int "ticks", which will be sortable in mongodb, and then wrap it in a DateTime after you pull it out of the database again, like so:
BsonDocument doc = new BsonDocument("dt", DateTime.UtcNow.Ticks);
coll.Insert (doc);
doc = coll.FindOne();
DateTime dt = new DateTime(doc.GetValue("dt").AsInt64);
// see it does have precision
Console.WriteLine(dt.Ticks);

MongoDB - Query embbeded documents

I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?
When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.
Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.
For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.