what does db.oplog.rs.stats().count mean in mongodb? - mongodb

In mongo local db, you can check oplog related data by using db.oplog.rs.stats()
But what does the "count" field mean? And I see it's decreasing every second in my db server.

The replica set oplog (oplog.rs) is a capped collection, which means it has a maximum total size for data. The underlying implementation varies by storage engine (eg WiredTiger vs MMAPv1) but the conceptual outcome is the same: capped collections make room for new documents by overwriting or expiring the oldest documents in the collection in FIFO order (First In, First Out).
But what does the "count" field mean?
As with any collection, the count information in db.collection.stats() indicates the number of documents currently in the collection.
For an explanation of collection stats output, see collStats in the MongoDB documentation.
Note: The output will vary depending on your version of MongoDB server and storage engine used.
I see it's decreasing every second in my db server.
The count of documents in the oplog will vary over time based on the size of the write operations being applied, so this is expected to fluctuate for an active deployment. For example, single field updates will generally write smaller oplog entries than full document updates. Once your oplog reaches its maximum data size, the count may also decrease as the oldest oplog documents are removed to make room for new oplog entries.

Related

why am I seeing less documents in my mongodb atlas than it shows when I use collection.count()

document count for each collection
As you can see, the number of documents is 3. When I click the collection though. I see only 2 documents.
collection view with less documents
I even used mongo shell and I got the same count.
mongosh output of document count
When I use mongo shell command db.call_logs.find(), all three are printed. Why is it not available on mongodb atlas and also when I query it on my nodejs application.
When I click the collection though. I see only 2 documents.
Why is it not available on mongodb atlas
You've shared with us a small screenshot to demonstrate the discrepancy that seems to be from the Atlas UI. Based on that, I suspect that the behavior that you are observing may be what is documented here:
The Atlas UI limits the total byte size of documents shown per page. As a result, you may see varying numbers of documents per page, especially if your documents vary significantly in size.
From your first screenshot we can see that the logical data size of the call_logs collection is 14MB, meaning that the average size of the 3 documents is close to 5MB. It seems that your callLogs array field may contain a significant amount of data. Therefore the UI is probably hitting the display size threshold after just 2 documents resulting in the behavior that you are observing.
If you remove that callLogs field from the results returned to the UI (by projecting it out) then it may display all 3 documents. In any case, the data is not missing from the collection it is simply not being displayed all at once.
Why is it not available ... when I query it on my nodejs application.
Have you observed the data not being available in the nodejs application? I don't think there is any evidence of this anywhere in your question.
As an aside, MongoDB documents have a strict size limit of 16MB. You may wish to review (and potentially modify) your schema to ensure that you don't run into this limit at some point in the future.

Why the Oplog decreasing Mongodb

What type of query generates a decrease in the oplog? How do I find out which queries are affecting my oplog decrease (when comparing month to month)? Could you help me?
Oplog window is affected by heavy write queries and particularly mass deletes of big size documents since single range delete operation disolve in the oplog as multiple individual delete queries.

How to shrink MongoDB's oplog.rs collection?

After storing some binary data in MongoDB 4.2.5 (3 nodes replicate set) the oplog.rs collection did grow to ca. 700MB. The binary data was removed and the data model restructured, but the oplog.rs collection stays the same size (as expected). I do understand that it's a capped collection with a maximum size and eventually it'll reuse the space. In my case though, I'd like to reclaim the space and start over. The database is used mostly for internal testing purposes. I don't mind losing some data from the oplog, but I do mind having a big oplog file, since the whole database is just a few MB.
Is it safe to use the emptycapped command on the oplog.rs collection in a replicate set scenario? Do I need to run this command on each node? Do I need to compact the collection after the deletion (last part from https://docs.mongodb.com/manual/tutorial/change-oplog-size/)?
Is there any other way to gracefully "reset" the oplog and free up the space?
OpLog is limited by what size you have defined in config or whether you have left it to default.
The OpLog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
It fills up to the defined size as the changes are coming through (or noops heartbeats).
If you want to reduce the size, reset the OpLog size in your config. But don't forget, larger OpLog size means you get a better OpLog window.
OpLog Window tells you how long a secondary member can be offline and still catch up to the primary without doing a full resync.

MongoDB capped collection performance

I am recently working on a time series data project to store some sensor data.To achieve maximum insertion/write throughput i used capped collection(As per the mongodb documentation capped collection will increase the read/write performance). when i test the collection for insertion/write of some thousand documents/records using python driver with capped collection without index against the normal collection, i couldn't see much difference in improvement in write performance of capped collection over normal collection. example is like i inserted 40K records on single thread using pymongo driver. capped collection took around 25.4 seconds and normal collection took 25.7 seconds.
Could anyone please explain me when can we achieve maximum insertion/write throughput of capped collection? Is this is the right choice for time series data collections?
Data stored into capped collections are rotated upon exceeding fixed size of capped collection .
Capped collections don't require any indexes as they preserve the insertion order and also data is retrieved in natural order same as order in which the database refers to documents on disk.Hence it offers high performance in insertion and data retrieval process.
For more detailed description related to Capped collections please refer the documentation as mentioned in URL
https://docs.mongodb.com/manual/core/capped-collections/

Limit the number of documents in a mongodb collection , without FIFO policy

I'm building an application to handle ticket sales and expect to have really high demand. I want to try using MongoDB with multiple concurrent client nodes serving a node.js website (and gracefully handle failure of clients).
I've read "Limit the number of documents in a collection in mongodb" (which is completely unrelated) and "Is there a way to limit the number of records in certain collection" (but that talks about capped collections, where the new documents overwrite the oldest documents).
Is it possible to limit the number of documents in a collection to some maximum size, and have documents after that limit just be rejected. The simple example is adding ticket sales to the database, then failing if all the tickets are already sold out.
I considered having a NumberRemaining document, which I could atomically decerement until it reaches 0 but that leaves me with a problem if a node crashes between decrementing that number, and saving the purchase of the ticket.
Store the tickets in a single MongoDB document. As you can only atomically set one document at a time, you shouldn't have a problem with document dependencies that could have been solved by using a traditional transactional database system.
As a document can be up to 16MB, by storing only a ticket_id in a master document, you should be able to store plenty of tickets without needing to do any extra complex document management. While it could introduce a hot spot, the document likely won't be very large. If it does get large, you could use more than one document (by splitting them into multiple documents as one document "fills", activate another).
If that doesn't work, 10gen has a pattern that might fit.
My only solution so far (I'm hoping someone can improve on this):
Insert documents into an un-capped collection as they arrive. Keep the implicit _id value of ObjectID, which can be sorted and will therefore order the documents by when they were added.
Run all queries ordered by _id and limited to the max number of documents.
To determine whether an insert was "successful", run an additional query that checks that the newly inserted document is within the maximum number of documents.
My solution was: I use an extra count variable in another collection. This collection has a validation rule that avoids count variables to become negative. Count variable should always be non negative integer number.
"count": { "$gte": 0 }
The algorithm was simple. Decrement the count by one. If it succeed insert the document. If it fails it means there is no space left.
Vice versa for deletion.
Also you can use transactions to prevent failures(Count is decremented but service is failed just before insertion operation).