Remove document if timestamp is too old - mongodb

I know there is possible to remove some documents when the saved date is passed away with this command:
db.course.deleteMany({date: {"$lt": ISODate()}})
I trying to do the same thing but I'm trying to check is a timestamps is passed away
All saved timestamps are like this one
1492466400000
Is it possible to make a command with a condition to delete all documents with a too old timestamp?
EDIT
I use milliseconds timestamps

In MongoDB you 3.2 you can also use TTL Indexes where you can say to MongoDB "please remove all the Documents if the {fieldDate} is more older then 3600 seconds". This is pretty useful for Logging (remove all logs more older then 3 months).
Maybe is not your use case, but I think is pretty good to know.
TL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time. Data expiration is useful for certain types of information like machine generated event data, logs, and session information that only need to persist in a database for a finite amount of time.
db.eventlog.createIndex( { "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 } )
https://docs.mongodb.com/v3.2/core/index-ttl/

I find the way to do this
db.course.deleteMany({date: {"$lt": ISODate()*1}})
ISODate()*1 condition will convert the Date into a timestamp
and
ISODate(timestamp) convert the timestamps into a Date.

Related

MongoDB: How to get the last updated timestamp of the last updated document in a collection

Is there a simple OR elegant method (or query that I can write) to retrieve the last updated timestamp (of the last updated document) in a collection. I can write a query like this to find the last inserted document
db.collection.find().limit(1).sort({$natural:-1})
but I need information about the last updated document (it could be an insert or an update).
I know that one way is to query the oplog collection for the last record from a collection. But it seems like an expensive operation given the fact that oplog could be of very large size (also not trustworthy as it is a capped collection). Is there a better way to do this?
Thanks!
You could get the last insert time same way you mentioned in the question:
db.collection.find().sort({'_id': -1}).limit(1)
But, There isn't any good way to see the last update/delete time. But, If you are using replica sets you could get that from the oplog.
Or, you could add new field in document as 'lastModified'.
You can also checkout collection-hooks. I hope this will help
One way to go about it is to have a field that holds the time of last update. You can name it updatedAt. Every time you make an update to the document, you'll just update the value to the current time. If you use the ISO format to store the time, you'll be able to sort without issues (that's what I use).
The other way is the _id field.
Method 1
db.collection.find().limit(1).sort({updatedAt: -1})
Method 2
db.collection.find().limit(1).sort({_id: -1})
You can try with ,
db.collection.findOne().sort({$natural:-1}).limit(1);

Should I use the timestamp in "_id"?

I need monitor the time of the records been created, for further query and modify.
first thing flashed in my mind is give the document a "createDateTime" field, with the default value of "new Date()", but Mongodb said the document _id has a timestamp embedded with, and the id was generated when the document was created, so it sounds dummy to add a new field for that.
for too many times, I've seen people set a "createDateTime" for their data, and I don't know if they know about the details of mongodb's _id.
I want know should I use the _id as a "createDateTime" field? what is the best practice?
and the pros and cons.
thanks for any tips.
I'd actually say it depends on how you want to use the date.
For example, it's not actionable using the aggregation framework Date operators.
This will fail for example:
db.test.aggregate( { $group : { _id: { $year: "$_id" } } })
The following error occurs:
"errmsg" : "exception: can't convert from BSON type OID to Date"
(The date cannot be extracted from the ObjectId.)
So, operations that are normally simple date operations become much more complex if you wanted to do any sort of date math in an aggregation. It would be far easier to have a createDateTime stamp. Counting the number of documents created in a particular year and month would be simple using aggregation with a distinct createdDateTime field.
You can sort on an ObjectId, to some degree. The remaining 8 bytes of the ObjectId aren't sortable in a meaningful way. Most MongoDB drivers default to creating the ObjectId within the driver and not on the database. So, if you've got multiple clients (like web servers for example) creating new documents (and new ObjectIds), the time stamps will only be as accurate as the various servers.
Also, depending the precision you'd need, an ISODate value is stored using 8 bytes, rather than the 4 used in an ObjectId.
Yes, you should. There is no reason not to do, besides the human readability while directly looking into the database. See also here and here.
If you want to use the aggregation framework to group by the date within _id, this is not possible yet as WiredPrairie correctly said. There is an open jira ticket for that, you might watch. But of course you can do this with Map-Reduce and ObjectID.getTimestamp(). An example for that can be found here.

What's the benefit of mongodb's ttl collection? vs purging data from a housekeeper?

I have been thinking about using the build in TTL feature, but it's not easy to dynamically changing the expiration date.
Since mongodb is using a background task purging the data. Is there any downside just coding my own purging function based on "> certain_date" and run say once a day?
This way, I can dynamically changing the TTL value, and this date field won't have to be single indexed. I can reuse this field as part of the complex indexing to minimize number of indexes.
There are 2 ways to set the expiration date on a TTL collection:
at a global level, when creating the index
per document, as a field in the document
Those modes are exclusive.
Global expiry
If you want all your documents to expire 3 months after creation, use the first mode by creating the index like the following:
db.events.ensureIndex({ "createdAt": 1 }, { expireAfterSeconds: 7776000 })
If you later decide to change the expiry to "4 months", you just need to update the expireAfterSeconds value using the collMod command:
db.runCommand({"collMod" : "events" , "index" : { "keyPattern" : {"createdAt" : 1 } , "expireAfterSeconds" : 10368000 } })
Per-document expiry
If you want to have every document has its own expiration date, save the specific date in a field like "expiresAt", then index your collection with:
db.events.ensureIndex({ "expiresAt": 1 }, { expireAfterSeconds: 0 })
I have been thinking about using the build in TTL feature, but it's not easy to dynamically changing the expiration date
That's odd. Why would that be a problem? If your document has a field Expires, you can update that field at any time to dynamically prolong or shorten the life of the document.
Is there any downside just coding my own purging function based on "> certain_date" and run say once a day?
You have to code, document and maintain it
Deleting a whole lot of documents can be expensive and lead to a lot of re-ordering. It's probably helpful to run the purging more often
Minimizing the number of indexes is a good thing, but the question is whether it's really worth the effort. Only you can give an answer to this question. My advice is: start with something that's already there if any possible and come up with something better if and only if you really have to.

Sort collection by insertion datetime using only id field

I have a collection of data and I want to get it sorted by insertion time. I have not any additional fields to store the insert time. But as I found out I can get this time from Id.
I have tried this code:
return bookmarks.find({}, {sort: {_id.getTimestamp(): 1}, limit: 10});
or
return bookmarks.find({}, {sort: {ObjectId(_id).getTimestamp(): 1}, limit: 10});
but get the error message:
=> Your application has errors. Waiting for file change.
Is there any way to sort collection by insertion datetime using only id field ?
At the moment this isn't possible with Meteor, even if it is with MongoDB. The ObjectID's created with meteor don't bear a timestamp. See http://docs.meteor.com/#collection_object_id
The reason for this is client side code can insert code and it can arrive late on the server, hence there is no guarantee the timestamp portion of the ObjectID will be accurate. In addition to the latency the client side's date is used meaning if they're off it's going to get you incorrect data. I think this is the reason they use an ObjectID but it is completely random.
If you want to sort by date you have to store the time/date separately.
The part what i striked out is not accurate. Meteor use it is own id generation which is based on a random string that is while does not apply the doc what i linked before. Check sasha.sochka's comment under.
It is nearly but not 100% good if you just sort for the _id field . While as it is constructed the first 4 byte is the timestamp in secs (so sorting for the getTimestamps value is not better). Under one second resolution you cannot get the exact order, as it is mentioned in the documentation: http://docs.mongodb.org/manual/reference/object-id/#objectid
It is still true that you can try to check the exact order of the insert/update ops against your collection in the oplog, if you have an oplog, but as it is a capped collection anyway you will see the recent operations only. http://docs.mongodb.org/manual/core/replica-set-oplog/.

MongoDB update many documents with different timestamps in one update

I have 10000 documents in one MongoDB collection. I'd like to update all the documents with datetime values that are 1 second apart for each document (so all the date time values are unique and are spaced 1 second apart). Is there any way to do this with a single update instead of updating each document in turn which results in 10000 distinct update operations?
Thanks.
No, there is no way to do this with a single update statement. There are no expressions which run at the server to allow this type of update. There is a feature request for this but it is not done so it cannot be used.