I'm trying to understand what is the capped collection is, specifically in context of MongoDB, and what would be the difference in compare with queue?
Capped collection will remove oldest document when it reaches it limit, so that could be an issue if there is a need to process ALL documents from capped collection.
from mongo: Capped collections work in a way similar to circular
buffers: once a collection fills its allocated space, it makes room
for new documents by overwriting the oldest documents in the
collection.
comparing to queue:
queue will not remove records when full (it could throw an exception
like out of memory)
queue can remove record when dequeued - in capped collection you need to delete it on your own
capped collection cleanup: if capped collection size is 40 documents - then when 41st document is added -> the 1st entry is removed
I think this the most important things - any comments welcome!
CAPPED collection in mongodb is implementation of circular buffer.
From official documentation
Capped collections are fixed-size collections that support high-throughput operations that insert and retrieve documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.
Related
I am recently working on a time series data project to store some sensor data.To achieve maximum insertion/write throughput i used capped collection(As per the mongodb documentation capped collection will increase the read/write performance). when i test the collection for insertion/write of some thousand documents/records using python driver with capped collection without index against the normal collection, i couldn't see much difference in improvement in write performance of capped collection over normal collection. example is like i inserted 40K records on single thread using pymongo driver. capped collection took around 25.4 seconds and normal collection took 25.7 seconds.
Could anyone please explain me when can we achieve maximum insertion/write throughput of capped collection? Is this is the right choice for time series data collections?
Data stored into capped collections are rotated upon exceeding fixed size of capped collection .
Capped collections don't require any indexes as they preserve the insertion order and also data is retrieved in natural order same as order in which the database refers to documents on disk.Hence it offers high performance in insertion and data retrieval process.
For more detailed description related to Capped collections please refer the documentation as mentioned in URL
https://docs.mongodb.com/manual/core/capped-collections/
Can we save new record in decending order in MongoDB? So that the first saved document will be returned last in a find query. I do not want to use $sort, so data should be presaved in decending order.
Is it possible?
According to above mentioned description ,as an alternative solution if you do not need to use $sort, you need to create a Capped collection which maintains order of insertion of documents into MongoDB collection
For more detailed description regarding Capped collections in MongoDB please refer the documentation mentioned in following URL
https://docs.mongodb.org/manual/core/capped-collections/
But please note that capped collections are fixed size collections hence it will automatically flush old documents in case when collection size exceeds size of capped collection
The order of the records is not guaranteed by MongoDB unless you add a $sort operator. Even if the records happen to be ordered on disk, there is no guarantee that MongoDB will always return the records in the same order. MongoDB does quite a bit of work under the hood and as your data grows in size, the query optimiser may pick a different execution plan and return the data in a different order.
I want to keep track of the documents removed from capped collection. Is there any way that I can know the when an document has been removed from the capped collection ? I want to maintain a list of the documents removed from the capped collection. Please help me.
I wonder if capped collections keep indexes for expired documents?
Removing documents from normal collection keeps indexes.
Capped collections remove documents by timer and do not allow db.collection.remove() at all.
I could not find any word in docs what happens with indexes for capped collections and would appreciate any help from ones who know.
TL;DR: The only way to remove documents from a capped collection is to drop the entire collection, that will also remove the indexes themselves from the collection.
I wonder if capped collections keep indexes for expired documents?
No. Documents that are no longer stored never remain in the index.
Removing documents from normal collection keeps indexes.
This is a bit misleading. Removing all documents from a normal collection by using db.collection.remove() removes both the documents from the collection and also deletes those documents from the index. It does not, however, remove the indexes of the collection, i.e. once you add new documents they are being added to the respective indexes again (i.e. removing the index itself is different from deleting documents from the index).
Capped collections remove documents by timer and do not allow db.collection.remove() at all.
The TTL-feature you linked has nothing to do with capped collections, in fact, the documentation says:
You cannot create a TTL index on a capped collection, because MongoDB cannot remove documents from a capped collection.
A collection with a TTL index does allow db.collection.remove.
A capped collection, on the other hand, has a fixed size (in terms of data size) and the oldest documents of the collection are automatically overwritten once the collection is full. This is not based on time, but purely on the size of the collection. Capped collections are always kept in insertion order (natural order).
Since the only way to remove documents from a capped collection is to drop the entire collection, that will also remove the indexes themselves from the collection.
I have a very large capped collection in mongodb. Given that the capped collection structure is predictable (i.e. sort is predefined, memory footprint is predefined, etc), is there a better way to get a cursor on the LATEST item inserted instead of iterating?
In other words, what I'm doing right now is to get the size of my collection (n), and then create a cursor that sets skip=n-1 to put me at the end of the collection. Then I iterate on the cursor and handle all new additions to the collection.
The problem with this approach is that my collection is huge. lets say 11m records. that takes 20 minutes to skip. Which means that when my cursor starts emitting data, its 20 minutes behind.
Try db.cappedCollection.find().limit(1).sort({$natural:-1}) .
Have you tried indexing the collection and using $gt - this should be faster although the index will have some impact on the speed of the writes to the collection.