Is there a way to count reads/writes per collection in mongodb? - mongodb

is there a way to view the reads/writes in mongodb on a per collection-basis? I would like to see how many documents have been read and written on a specific collection.
We are currently researching the costs of some specific queries and try to find out more about if they are heavy read- and/or write-tasks.
Thank you :-)

Yes you can. You can perform db.collection.stats()
this will return the size and count of documents, index information and a lot of other useful information. But you want to count the number of reads and writes performed on a specific collection. For that you can use mongostat. It captures and returns the counts of database operations by type (e.g. insert, query, update, delete, etc.). These counts report on the load distribution on the server. Read more about mongostat on their documentation. Here's the link https://docs.mongodb.com/manual/reference/program/mongostat/#bin.mongostat

Related

Does Firestore's 500 writes/second limit apply to updates to non-sequential fields in documents with indexed sequential fields?

Let's say we have the following database structure:
[collection]
<documentId>
- indexedSequentialField
- indexedNonSequentialField
- nonIndexedSequentialField
Firestore's 500 writes/second limit will apply to the creation of new documents if indexedSequentialField is there during the creation. Similarly, Firestore's 500 writes/second limit should also apply to any updates to the documents that change indexedSequentialField because that involves rewriting the index entry. This part is clear.
My understanding is that this limit comes from writing the index entries and not to the collection itself.
If that's true, would it be correct to say that making more than 500 updates per second to the documents that change only the indexedNonSequentialField or nonIndexedSequentialField is fine as long as the indexedSequentialField is not changed, even if the indexedSequentialField is already present in the documents and the index entries since their creation?
For the sake of this question, please assume that there are no composite indices present that end up being sequential in nature.
Firestore's hotspots on writes occur when it needs to write data from multiple write operations close to each other on disk, as it needs to synchronize those writes across multiple data centers while also isolation each write from each other (since its write operations are immediately consistent).
If your collection or collection group has an index with sequential fields, that will trigger a hotspot indeed. Note that the limit of 500 writes per second is a soft limit, and you may well be able to write much more than that before hitting a hotspot. Nowadays I recommend using the Key Visualizer to analyze the performance of your writes.

MongoDB atlas archive after total document reaches a certain number

I want to write a custom archive rule in mongodb that archives the data with some condition.
Let's say I have a collection A.
If the collection has more than 1000 docs, archive the oldest doc (I have createdAt) until total document count is 1000. So basically, It should not exceed 1000.
You can implement this in a number of different ways, there is no OOTB solution for this.
I would personally use a capped collection with a size set to a 1000, this will let Mongo handle the most difficult part of your requirements, Regarding the "archiving" I would create an additional collection for archiving purposes and insert the document into both collections.
This will allow you to have a lean capped collection for your queries, and an additional "archive" collection for historical queries.
There are additional points to consider that you didn't specify, are the capped collection limitations an issue? do we need to support updates? what is the frequency of such operations, and so on.

Is it worth splitting one collection into many in MongoDB to speed up querying records?

I have a query for a collection. I am filtering by one field. I thought, I can speed up query, if based on this field I make many separate collections, which collection's name would contain that field name, in previous approach I filtered with. Practically I could remove filter component in a query, because I need only pick the right collection and return documents in it as response. But in this way ducoments will be stored redundantly, a document earlier was stored only once, now document might be stored in more collections. Is this approach worth to follow? I use Heroku as cloud provider. By increasing of the number of dynos, it is easy to serve more user request. As I know read operations in MongoDB are highly mutual, parallel executed. Locking occure on document level. Is it possible gain any advantage by increasing redundancy? Of course index exists for that field.
If it's still within the same server, I believe there may be little parallelization gain (from the database side) in doing it this way, because for a single server, it matters little how your document is logically structured.
All the server cares about is how many collection and indexes you have, since it stores those collections and associated indexes in a number of files. It will need to load these files as the collection is accessed.
What could potentially be an issue is if you have a massive number of collections as a result, where you could hit the open file limit. Note that the open file limit is also shared with connections, so with a lot of collections, you're indirectly reducing the number of possible connections.
For illustration, let's say you have a big collection with e.g. 5 indexes on them. The WiredTiger storage engine stores the collection as:
1 file containing the collection data
1 file containing the _id index
5 files containing the 5 secondary indexes
Total = 7 files.
Now you split this one collection across e.g. 100 collections. Assuming the collections also requires 5 secondary indexes, in total they will need 700 files in WiredTiger (vs. of the original 7). This may or may not be desirable from your ops point of view.
If you require more parallelization if you're hitting some ops limit, then sharding is the recommended method. Sharding the busy collection across many different shards (servers) will immediately give you better parallelization vs. a single server/replica set, given a properly chosen shard key designed to maximize parallelization.
Having said that, sharding also requires more infrastructure and may complicate your backup/restore process. It will also require considerable planning and testing to ensure your design is optimal for your use case, and will scale well into the future.

How to select all documents in MongoDB collection by parallel processes?

I have multiple worker processes which select data from huge mongodb collection and performs some complex calculations.
Each document from MongoDB collection should be processed only once.
Now I'm using following technic: I each worker marks and selects documents to process by .FindOneAndUpdate method. It finds a not marked document, marks it, and return to the worker. FindOneAndUpdate (findAndModify) is an atomic operation, so each document is selected only once.
Selecting documents one by one looks not so efficient. Is there some way to select by 100 documents and be sure document will be processed only once?
Is there some other, maybe MongoDB specific way to process a huge number of documents in parallel?
Interesting...
One way to solve that is by implementing segments for your data. Let's say you have 1M documents in your collection and 100 workers, find a field on your structure that can be equally-ish divided and pre-assign 10K documents for each worker.
But that process may be overkilled and will its efficiency could not really be better than query and process the documents individually. If you set an index on your marked field, the operation should be quite efficient as mongo will know where to look for unmarked documents.
I think the safest way to do what you need is actually processing them one by one. Mongo's atomicity is at a document level, so you may not find a way to lock several specific documents at the same time. $isolated operator may help you in case that you may find a good way to segment the data for your workers.
This another answer has useful links regarding atomicity and $isolated operator.

MongoDB -- large number of documents

This is related to my last question.
We have an app where we are storing large amounts of data per user. Because of the nature of data, previously we decided to create a new database for each user. This would have required a large no. of databases (probably millions) -- and as someone pointed out in a comment, that this indicated wrong design.
So we changed the design and now we are thinking about storing each user's entire information in one collection. This means one collection exactly maps to one user. Since there are 12,000 collections available per database, we can store 12,000 users per DB (and this limit could be increased).
But, now my question is -- is there any limit on the no. of documents a collection can have. Because of the way we need to store data per user, we expect to have a huge (tens of millions in extreme cases) no. of document per documents. Is that OK for MongoDB and design-wise?
EDIT
Thanks for the answers. I guess then it's OK to use large no of documents per collection.
The app is a specialized inventory control system. Each user has a large no. of little pieces of information related to them. Each piece of information has a category and some related stuff under that category. Moreover, no two collections need to see each other's data -- hence an index that touch more than one collection is not needed.
To adjust the number of collections/indexes you can have (~24k is the limit--~12k is what they say for collections because you have the _id index by default, but keep in mind, if you have more indexes on the collections, that will use namespace up as well), you can use the --nssize option when you start up mongod.
There are plenty of implementations around with billions of documents in a collection (and I'm sure there are several with trillions), so "tens of millions" should be fine. There are some numbers such as counts returned that have constraints of 64 bits, so after you hit 2^64 documents you might find some issues.
What sort of query and update load are you going to be looking at?
Your design still doesn't make much sense. Why store each user in a separate collection?
What indexes do you have on the data? If you are indexing by some field that has content that's common across all the users you'll get a significant saving in total index size by having a single collection with one index.
Index size is often the limiting factor not total database size when it comes to performance.
Why do you have so many documents per user? How large are they?
Craigslist put 2+ billion documents in MongoDB so that shouldn't be an issue if you have the hardware to support it and aren't being inefficient with your indexes.
If you posted more of your schema here you'd probably get better advice.