Expire a Collection in Mongo using Casbah EnsureIndex - mongodb

I am attempting to expire a collection in Mongo using Casbah's ensureIndex API.
Based on this document
http://docs.mongodb.org/manual/tutorial/expire-data/
I am using casbah's proposed ensureIndex API
collection.ensureIndex(DBObject("status" -> 1, "expireAfterSeconds" -> 120))
to expire the collection in 2 minutes ...
The collection is not being evicted or expired.
Am I missing anything else here?
Thanks

There are a couple things to check:
Were you just following the docs to a T and tried to create an index on a status field which doesn't actually exist in your documents? (had to atleast ask...)
Does the status field contain JUST dates? It can theoretically be mixed, but only documents with a date type would be considered for expiration.
Have you checked your collection indexes to make sure the index was properly created?
To check for the index from the console run: db.collection.getIndexes(). If the index was created successfully, then double check you have the corresponding status fields in your documents and that they are proper dates.
Adding the index alone, doesn't create the date field for you - you would need to add it to the documents or use an existing date field that is not part of any other index.
Also note, from the docs:
TTL indexes expire data by removing documents in a background task
that runs every 60 seconds
So, if you have a 120 second expiration, bear in mind, that its possible the documents could remain for 120 seconds up to 179 seconds, give or take, depending on when the document expired and the background task last ran.
edit: As noted in the comments - a collection itself cannot be removed based on a TTL index, the index only expires the documents in the collection.

I think, you are passing the Options in the wrong way.
It should be-
collection.ensureIndex(DBObject("status" -> 1), DBObject("expireAfterSeconds" -> 120))
Instead of-
collection.ensureIndex(DBObject("status" -> 1, "expireAfterSeconds" -> 120))

Related

MongoDB: How to get the last updated timestamp of the last updated document in a collection

Is there a simple OR elegant method (or query that I can write) to retrieve the last updated timestamp (of the last updated document) in a collection. I can write a query like this to find the last inserted document
db.collection.find().limit(1).sort({$natural:-1})
but I need information about the last updated document (it could be an insert or an update).
I know that one way is to query the oplog collection for the last record from a collection. But it seems like an expensive operation given the fact that oplog could be of very large size (also not trustworthy as it is a capped collection). Is there a better way to do this?
Thanks!
You could get the last insert time same way you mentioned in the question:
db.collection.find().sort({'_id': -1}).limit(1)
But, There isn't any good way to see the last update/delete time. But, If you are using replica sets you could get that from the oplog.
Or, you could add new field in document as 'lastModified'.
You can also checkout collection-hooks. I hope this will help
One way to go about it is to have a field that holds the time of last update. You can name it updatedAt. Every time you make an update to the document, you'll just update the value to the current time. If you use the ISO format to store the time, you'll be able to sort without issues (that's what I use).
The other way is the _id field.
Method 1
db.collection.find().limit(1).sort({updatedAt: -1})
Method 2
db.collection.find().limit(1).sort({_id: -1})
You can try with ,
db.collection.findOne().sort({$natural:-1}).limit(1);

How to find last update/insert/delete operation time on mongodb collection without objectid field

I have some unused collections in the MongoDb database. I've to find out when the CRUD operations done against collections in the database. We have our own _id field instead of mongo's default object_id. We dont have any time filed in the collections to find out the modification time. is there any way to find out the modification time of collections in mongodb from meta data? Is there any data dictionay informations like in oracle to find out this? please give some idea/workarounds
To make a long story short: MongoDB has a flexible schema. Simply add a date field. Since older entries don't have it, they can not be the last entry.
Let's call that field mtime.
So after adding a date field to your schema definition, we generate an index in descending order on the new field:
db.yourCollction.createIndex({mtime:-1})
Finding the last mtime for a collection now is easy:
db.yourCollection.find({"mtime":{"$exists":true}}).sort({"mtime":-1}).limit(1)
Do this for every collection. When the above query does not return a value within the timeframe you defined for purging a collection, simply drop it, since it has not been modified since you introduced the mtime field.
After your collections are cleaned up, you may remove the mtime field from your schema definition. To remove it from the documents, you can run a simple query:
db.yourCollection.update(
{ "mtime":{ $exists:true} },
{ "$unset":{ "mtime":""} },
{ multi: true}
)
There is no "data dictionary" to get this information in MongoDB.
If you've enabled the profiling level in advance to log all operations (db.setProfilingLevel(2)) and you haven't had many operations to log, so that the system.profile capped collection hasn't overwritten whatever logs you are interested in, you can get the information you need there—but otherwise it's gone.

What's the benefit of mongodb's ttl collection? vs purging data from a housekeeper?

I have been thinking about using the build in TTL feature, but it's not easy to dynamically changing the expiration date.
Since mongodb is using a background task purging the data. Is there any downside just coding my own purging function based on "> certain_date" and run say once a day?
This way, I can dynamically changing the TTL value, and this date field won't have to be single indexed. I can reuse this field as part of the complex indexing to minimize number of indexes.
There are 2 ways to set the expiration date on a TTL collection:
at a global level, when creating the index
per document, as a field in the document
Those modes are exclusive.
Global expiry
If you want all your documents to expire 3 months after creation, use the first mode by creating the index like the following:
db.events.ensureIndex({ "createdAt": 1 }, { expireAfterSeconds: 7776000 })
If you later decide to change the expiry to "4 months", you just need to update the expireAfterSeconds value using the collMod command:
db.runCommand({"collMod" : "events" , "index" : { "keyPattern" : {"createdAt" : 1 } , "expireAfterSeconds" : 10368000 } })
Per-document expiry
If you want to have every document has its own expiration date, save the specific date in a field like "expiresAt", then index your collection with:
db.events.ensureIndex({ "expiresAt": 1 }, { expireAfterSeconds: 0 })
I have been thinking about using the build in TTL feature, but it's not easy to dynamically changing the expiration date
That's odd. Why would that be a problem? If your document has a field Expires, you can update that field at any time to dynamically prolong or shorten the life of the document.
Is there any downside just coding my own purging function based on "> certain_date" and run say once a day?
You have to code, document and maintain it
Deleting a whole lot of documents can be expensive and lead to a lot of re-ordering. It's probably helpful to run the purging more often
Minimizing the number of indexes is a good thing, but the question is whether it's really worth the effort. Only you can give an answer to this question. My advice is: start with something that's already there if any possible and come up with something better if and only if you really have to.

Sort collection by insertion datetime using only id field

I have a collection of data and I want to get it sorted by insertion time. I have not any additional fields to store the insert time. But as I found out I can get this time from Id.
I have tried this code:
return bookmarks.find({}, {sort: {_id.getTimestamp(): 1}, limit: 10});
or
return bookmarks.find({}, {sort: {ObjectId(_id).getTimestamp(): 1}, limit: 10});
but get the error message:
=> Your application has errors. Waiting for file change.
Is there any way to sort collection by insertion datetime using only id field ?
At the moment this isn't possible with Meteor, even if it is with MongoDB. The ObjectID's created with meteor don't bear a timestamp. See http://docs.meteor.com/#collection_object_id
The reason for this is client side code can insert code and it can arrive late on the server, hence there is no guarantee the timestamp portion of the ObjectID will be accurate. In addition to the latency the client side's date is used meaning if they're off it's going to get you incorrect data. I think this is the reason they use an ObjectID but it is completely random.
If you want to sort by date you have to store the time/date separately.
The part what i striked out is not accurate. Meteor use it is own id generation which is based on a random string that is while does not apply the doc what i linked before. Check sasha.sochka's comment under.
It is nearly but not 100% good if you just sort for the _id field . While as it is constructed the first 4 byte is the timestamp in secs (so sorting for the getTimestamps value is not better). Under one second resolution you cannot get the exact order, as it is mentioned in the documentation: http://docs.mongodb.org/manual/reference/object-id/#objectid
It is still true that you can try to check the exact order of the insert/update ops against your collection in the oplog, if you have an oplog, but as it is a capped collection anyway you will see the recent operations only. http://docs.mongodb.org/manual/core/replica-set-oplog/.

How to fetch the newest 10 posts from 10 categories in one MongoDB query?

I have a collections of documents (named posts) that each contain a field named category.
Each category is part of a categories collection. There are a fixed number of them (say 15).
How do I fetch the last 10 tldrs from each category?
Another solution would be to set a "flag" in each post which is actually part of the result, like:
topTen: true
Defining a sparse index on that flag would give the fastest query - at the price, of course, of the maintenance of that flag:
set the flag at insertion time (impact: one more index to update)
if it is tolerable that for a certain period the query returns 11 posts instead of 10, then trigger a background process that deletes (unsets) the 11th flag for that category
if it is not tolerable, find and unset the 11th flag at insert time
if the category of an existing post is altered, make sure the flags get set all right (for the old and the new category)
if a post gets removed that has the flag set: find and set the flag for the new 10th post
may be you'd want to provide a periodically ran process, that makes sure the flags are all set as they should be
For more information on sparse indexes, see http://docs.mongodb.org/manual/core/indexes/#index-type-sparse
Probably it will be better to just at first get the list of all categories and then for each of them get their 10 latest posts by separate queries.