mongodb capped collections with TTL - mongodb

I have an application that allows users to chat with other users,
I only want to store X messages per each conversation, AND each message must be deleted after 1 month since its creation (TTL)
MongoDB support capped collections with TTL?
The documentation says:
https://docs.mongodb.org/manual/core/capped-collections/#automatically-remove-data-after-a-specified-period-of-time
For additional flexibility when expiring data, consider MongoDB’s TTL indexes, as described in Expire Data from Collections by Setting TTL. These indexes allow you to expire and remove data from normal collections using a special type, based on the value of a date-typed field and a TTL value for the index.
TTL Collections are not compatible with capped collections.
I think not, so, there is any alternative to accomplish it?
Thanks

Related

MongoDB atlas archive after total document reaches a certain number

I want to write a custom archive rule in mongodb that archives the data with some condition.
Let's say I have a collection A.
If the collection has more than 1000 docs, archive the oldest doc (I have createdAt) until total document count is 1000. So basically, It should not exceed 1000.
You can implement this in a number of different ways, there is no OOTB solution for this.
I would personally use a capped collection with a size set to a 1000, this will let Mongo handle the most difficult part of your requirements, Regarding the "archiving" I would create an additional collection for archiving purposes and insert the document into both collections.
This will allow you to have a lean capped collection for your queries, and an additional "archive" collection for historical queries.
There are additional points to consider that you didn't specify, are the capped collection limitations an issue? do we need to support updates? what is the frequency of such operations, and so on.

Mongodb always increased "_id" field?

Is _id field in mongodb always increased for the next inserted document in the collection even if we have multiple shards? So if I have collection.watch do I always get higher _id field for the next document than for the prev one? I need this to implement catch-up subscription and not to lose any document. So on every processed document from collection.watch I store its _id and if crash - I can select all documents with _id > last_seen_id in addition to collection.watch.
Or do I have to use some sort of auto-incemented value? I don't wanna cause it will hurt performance a lot and kill reason of sharding.
ObjectIds are guaranteed to be monotonically increasing most of the time, but not all of the time. See What does MongoDB's documentation mean when it says ObjectIDs are "likely unique"? and Can a 4 byte timestamp value in MongoDb ObjectId overflow?. If you need a guaranteed monotonically increasing counter, you need to implement it yourself.
As you pointed out this isn't a trivial thing to implement in a distributed environment, which is why MongoDB doesn't provide this.
One possible solution:
Have a dedicated counter collection
Seed the collection with a document like {i: 1}
Issue find-and-modify operation that uses https://docs.mongodb.com/manual/reference/operator/update/inc/ and no condition (thus affecting all documents in the collection, i.e. the one and only document which is the counter)
Request the new document as the update result (e.g. https://docs.mongodb.com/ruby-driver/master/tutorials/ruby-driver-crud-operations/#update-options return_document: :after)
Use the returned value as the counter
This doesn't get you a queue. If you want a queue, there are various libraries and systems that provide queues.

Partitioning records in a collection in MongoDB

I have an usecase where a set of records in a collection need to be deleted after a specified interval of time.
For ex: Records older than 10hours be deleted every 10th hour.
We have tried deletion based on id but found it to be slow.
Is there a way to partition the records in a collection and drop a partition as and when required in Mongo
MongoDB does not currently support partitions, there is a JIRA ticket to add this as a feature (SERVER-2097).
One solution is to leverage multiple, time-based collections, cycling collections in a similar way as you would partitions. Typically we would do this when you'd usually only be querying one or few of these time-based collections. If you would often need to read across multiple collections, you could add some wrapper code to simplify that.
There's also TTL Indexes, which leverage a background thread in the mongod server to handle the deletes for you.
Your deletes by _id may have been slow for a number of reasons, and probably warrants more investigation beyond your original question.

MongoDB sharding by collection

I have an application which creates a collection in MongoDB for every user where a collection is expected to have at most 100,000 documents (a few "big" users are like this while many "small" users only have less than 10,000 documents). Now the number of users grows and I want to shard my database. Is it possible to say "put this collection (thus this user) on this shard and that collection on that shard, but do not shard documents inside a collection further", and is it possible to do this automatically?
Edit: I'm already aware of MongoDB's standard sharding design now, but my application was scaled up from a small application for single person's use, where a nedb datastore is created for the user. When the multi-user support was added, it was an obvious choice to create a nedb datastore for every user so many parts of my application could stay unchanged. When I migrated it to MongoDB, since one nedb datastore is the equivalent of a MongoDB collection, I was using one collection per user. Given the current situation, I wonder the quickest way (~= with the smallest change to my application and overall configurations) to solve the current performance issue.
Sharding is done on a collection and how the sharded collection is broken up is based on the shard key (where one or more object elements from your collection make up the key).
It might be better to rethink your document design. You could have all users in one collection and then use the user id as the shard key. That would shard each user as a whole and do it automatically.
See Mongodb's Sharding documentation for more information on sharding.

sharded collection's indexes need to start with the shard key?

I read through the sharding docs on the mongo official site.
However, I can't an answer for these:
Do all of a sharded collection's indexes need to start with the shard key?
If I required a TTL index on a field for a sharded collection, and since compound indexes are not supported for TTL, what kind I do in this case? (field != shard key)
No. You can have any index on a sharded collection. However, queries which do not include the shard key will be sent to all shards. The individual shard will then make use of any existing index, sending back it's result to the mongos query router, which in turn will sort the results, if required, and send the result set back to the client. Please read Routing Process in the MongoDB docs for further details.
The TTL removal is a background process which runs on a date field. Each of your shards will spawn said background process. So you can simply create the TTL index on the date field of your choice. Each individual shard will take care of the documents which are to be deleted.