Can mongo do autoremove collections? - mongodb

I heared, that mongo can do it, but I can't find how.
Can mongo create collections, which will be autoremove in future, from time, which i can setup? Or Mongo can't do this magic?

mongodb cannot auto remove collections but it can auto remove BSON records. You just need to set ttl(Time to live) index on a date field that exists in BSON record .
You can read more here MongoDb: Expire Data from Collections by Setting TTL

Collections are auto created on the first write operation (insert, upsert, index creation). So this magic is covered.
If your removal is based on time, you could use cron or at to run this little script
mongo yourDBserver/yourDB --eval 'db.yourCollection.drop()’
As Sammaye pointed out, creating indices is a costly operation. I would assume there is something wrong with your data model. For semantically distinguishing documents, I'd rather create a field on them which does that and set an expiration date or a creation date and a time frame in which the documents are valid and use TTL indices to remove all of those documents.
For using an expiration date, you have to set a field to an ISODate and create a TTL index without a duration:
db.yourColl.ensureIndex({"yourExpirationDateField":1},{expireAfterSeconds:0})
In the case you want the documents to be valid for let's say a week after they are created, you would use the following:
db.yourColl.ensureIndex({"yourCreationDate":1},{expireAfterSeconds:604800})
Either way, here is what happens: Once every minute a background thread called TTLMonitor wakes up, gets all TTL indices for the server and starts processing them. It scans the TTL index, looking for the date values, adds the value given for "expireAfterSeconds" and deletes all documents which it determined to be invalid by now. This process takes some time, so don't expect the documents to be deleted on the very second they expire.
The big advantage of that approach: you don't need any triggering logic to be maintained, the deletes are done automagically in the background and you don't put any load on your application. Plus, using an expiration date, your have very granular control over when a document expires.
The drawback is ... ... Well, if I want to find one it would be that you have to insert a creation date for every document or calculate and insert an expiration date. And you have to send an administrative command to the mongod/mongos once in the application lifetime...

Related

A good way to expire specific documents in one collection and add them to another collection on expiry

I'm using Nodejs and MongoDB driver.
A two part question
I have a collection which is called openOffers which I want to expire when it hits the time closeOfferAt. I know that MongoDB offers a TTL, expireAt and expireAfterSeconds. But, when I use these, the same TTL is applied to all the documents in a particular collection. I'm not sure if I'm doing this correctly. I want document level custom expiry. Any syntax might be very useful!
Docs in openOffers
{
"_id":"12345",
"data": {...},
"closeOfferAt" : "2022-02-21T23:22:34.023Z"
}
I want to push these expired documents to another collection openOfferLog. This is so that I can retain the document for later analysis.
Current approach:
I haven't figured a way to have a customized TTL on each doc in openOffers. But, I currently insert docs into both, openOffers and openOffersLog together, and any data change in openOffers has to be separately propagated to openOffersLog to ensure consistency. There has to be a better scalable approach I suppose.
EDIT-1:
I'm looking for some syntax logic that I can use for the above use case. If not possible with the current MongoDB driver, I'm looking for an alternative solution with NodeJS code I can experiment with. I'm new to both NodeJS and MongoDB -- so any reasoning supporting the solution would be super useful as well.
There are two ways to implement TTL indexes.
Delete after a certain amount of time - this is you have already implemented
Delete at a specific clock time - for the detailed answers you can visit the MongoDB Docs
So the second option fulfills your expectation,
Just set 0 (zero) in expireAfterSeconds field while creating an index,
db.collection.createIndex({ "closeOfferAt": 1 }, { expireAfterSeconds: 0 })
just set the expiration date in closeOfferAt while inserting the document, this will remove the document at a particular timestamp.
db.collection.insert({
"_id":"12345",
"data": {...},
"closeOfferAt" : ISODate("2022-02-23T06:14:15.840Z")
})
Do not make your application logic depend on TTL indexes.
In your app you should have some scheduler that runs periodic tasks, some of them would move the finished offers to other collection and delete from the original even in bulk and no need to set a TTL index.
To keep consistency nothing better than a single source of truth, so if you can, avoid deleting and only change some status flag and timestamps.
A good use of a TTL index is to automatically clear old data after a relative long time, like one month or more. This keeps collection/indexes size in check.

MongoDB TTL but to do other stuff

I have a requirement that when a date attribute field is passed, that we would like to trigger two things:
to move the record to be deleted to another table.
to call a function to do other actions.
I understand TTL is only to delete a record when the date field is tripped. Can I hook extra logic to it?
Thanks!
Depending on the requirements there could be quite a few ways to do this.
One way is to execute a script periodically, and run a query to filter documents that have passed certain date value. For each of the documents, perform a document migration to another table and extra actions.
Alternatively is to use MongoDB Change Streams. The trick however, is that delete events from change stream do not return the document itself (because it's already been deleted).
Instead if you were to update a field for documents that have passed certain date value you could listen for the update events. For example, sets a field value to expired:true.
Worth mentioning that if you're going down the route of change streams update events, you could utilise MongoDB Stitch Triggers (relying on change streams). MongoDB Stitch database triggers allow you to automatically execute Stitch functions in response to changes in your MongoDB database.
I suggest write a function and call it via scheduler. That will be the better option to do it.

Simple system to receive millions of records and produce a single file without duplicates

I have a system where 10s of client machines are sending objects to a single server. The job of the server is to aggregate all the objects (removing duplicates - and there are many) and produce a file every hour of the objects received the previous hour.
I tried MongoDB for this task and it did a good a job but there is the overhead of going over all the records by the end of each hour to produce the file. I am now thinking about gradually building the file as data is received stopping by the end of the hour and starting a new file and so on.
I don't need to do any searching or querying of the data, just dropping duplicates based on a key and producing a file of all the data. Also the first time I receive a record, the duplicates come within a maximum of 3 minutes afterwards.
Which system should I use? Do you recommend a different approach?
I would recommend, even though you state in your comments you don't like the idea of it, to use indexes. You can use a unique index on these fields and you use that as a method to insert.
This does, as you rightly point out, produce a full scan however whichever race condition free route you take (the only way to ensure non-duplicates really) you will need to do a full index scan, either by query or by index insertion.
Index insertion probably the best router here, at the end of the day the performance makes it not really matter.
As for dealing with removing your old records I would not use a TTL index. Instead it would be much better to just drop your collection when your ready to receive a new batch, not only will this be a lot faster but it will also send the collection to $freelists instead of adding the documents from the TTL index to a deleted bucket list potentially causing fragmentation and slowing down your system.
Consider this document:
{
"name" : "a",
"type" : "b",
"hourtag": 10,
"created": ISODate("2014-03-13T06:26:01.238Z")
}
Let's say we set up a unique index on this for name and type, another hourtag property, which the value you of you add to the document representing the hour of day it was inserted. Also add a created date if there is not something already and we set another index on that
db.collection.ensureIndex({ hourtag: 1, name: 1, type: 1})
db.collection.ensureIndex({ created: 1, { expireAfterSeconds: 7200 })
The second index is defined as a TTL index, and set the expireAfterSeconds value to be 2 hours.
So you insert your documents as you go, adding the property for the "current hour" that you are in, and the duplicate items will fail to insert.
At the end of the hour, get all the documents for the "last" hour value and process them.
Using the "TTL" index, the documents you no longer need get cleaned up after their expiry time.
That's the most simple implementation I can think of. Tweak the expiry time to your own needs.
Defining the hourtag first in the index order gives you a simple search, while maintaining your "duplicate" rules.

MongoDB: upsert multiple fields based on multiple criteria

I am new to Mongo. I wanted to atomically upsert a document based on multiple criteria. The document looks like the following:
{_id:..., hourOfTime:..., total:..., max:..., min:..., last:...}
This is basically hourly aggregation for number of clicks for an item identified by _id. The clicks for each item is is flushed from the application to MongoDB every five seconds. So, the document need to be updated every five seconds.
Here is the case. Lets say at t=t0, we have {_id:"nike", total:123, max:10, min:3, last:9} then at t=t1, I get message {_id:"nike", count: 12}. Now, for _id="nike", I need to do the following,
Increment total by 12
if max < 12, update max=12
if min > 12, update min=12
update last=12
I want all this operation to be atomic. I unable to convert this in one single query. Any help/hint is appreciated.
This cannot be done with a single query. Here is how I would do it:
Have a field on the document called locked. Run a findAndModify to grab the document if the locked field is false, and set the locked field to the Date() that it is locked at. This will stop other application instances from modifying the document, providing they also check for the locked field.
Modify the document application side, then atomically update it, setting locked to false.
As long as anything modifying the document runs a findAndModify on the locked field, this entire modification should be atomic.
Make sure to have a threshold at which a lock times out, so that documents do not become locked indefinitely if a machine blows up. This means that when updating the document the second time (and releasing the lock), the application should make sure that the date in the document is what it expects, to make sure it still has the lock.

When will a mongodb document expire after it is updated?

I have a collections of documents in mongodb, with the expireAfterSeconds property set on a date-type index.
For the sake of argument, the documents are set to expire after one hour.
When I update a document in this collection, which one of the following will happen?
a) The document will expire one hour after the original creation time.
b) The document will expire one hour after the update time.
c) The document will expire one hour after the indexed variable's time, whatever that may be.
d) None of the above
I think that it's c, but cannot find the reference to confirm it. Am I correct? Where is this documented?
[edit]: To clarify, the situation is that I'm storing password reset codes (that should expire.) and I want the old codes to stop working if a new code is requested. It's not very relevant though, since I can ensure the behaviour I want is always respected by simply deleting the old transaction. This question is not about my current problem, but about Mongo's behaviour.
The correct answer is c)
The expireAfterSeconds property always requires an index on a field which contains a BSON date, because the content of this date field is used to select entries for removal.
When you want an update of a document to reset the time-to-live, also update the indexed date field to the current time.
When you want an update to not affect the TTL, just don't update the date.
However, keep in mind that expireAfterSeconds doesn't guarantee immediate deletion of the document. The deletions are done by a background job which runs every minute. This job is low-priority and can be postponed by MongoDB when the current load is high. So when it's important for your use-case that the expire times are respected accurately to the second, you should add an additional check on the application level.
This feature is documented here: http://docs.mongodb.org/manual/tutorial/expire-data/
If you don't want to rely on mongo demon process for expiring the collection, then better to create an additional createdOn field on collection and compare it with the current timestamp to decide whether to use that document or not.