This is about a recommendation on mongodb. I have a collection that always increase row by row (I mean the count of documents). It is about 5 billion now. When I make a request on this collection I sometimes get the error about 16 MB size.
The first thing that I want to ask is which structure is the best way of creating collections that increasing the rows hugely. What is the best approach? What should I do for this kind of structure and the performance?
Just to clarify, the 16MB limitation is on documents, not collections. Specifically, its the maximum BSON document size, as specified in this page in the documentation.
The maximum BSON document size is 16 megabytes.
If you're running into the 16MB limit in aggregation is because you are using MongoDB version 2.4 or older. In these, the aggregate() method returned a document, which is subject to the same limitation as all other documents. Starting in 2.6, the aggregate() method returns a cursor, which is not subject to the 16MB limit. For more information, you should consult this page in the documentation. Note that each stage in the aggregation pipeline is still limited to 100MB of RAM.
Related
I am recently working on a time series data project to store some sensor data.To achieve maximum insertion/write throughput i used capped collection(As per the mongodb documentation capped collection will increase the read/write performance). when i test the collection for insertion/write of some thousand documents/records using python driver with capped collection without index against the normal collection, i couldn't see much difference in improvement in write performance of capped collection over normal collection. example is like i inserted 40K records on single thread using pymongo driver. capped collection took around 25.4 seconds and normal collection took 25.7 seconds.
Could anyone please explain me when can we achieve maximum insertion/write throughput of capped collection? Is this is the right choice for time series data collections?
Data stored into capped collections are rotated upon exceeding fixed size of capped collection .
Capped collections don't require any indexes as they preserve the insertion order and also data is retrieved in natural order same as order in which the database refers to documents on disk.Hence it offers high performance in insertion and data retrieval process.
For more detailed description related to Capped collections please refer the documentation as mentioned in URL
https://docs.mongodb.com/manual/core/capped-collections/
I am planning to save huge number of ids for foreignkey ids in this array. So, I am just checking what would be the max number of BSON::ObjectIds I can save in the field array. Lets say for example
department_ids: [BSON::OBjectId('57cf6d6e8315292136000001'), BSON::OBjectId('57cf6d6e8315292136000002') ...... ]
16MB is big enough to hold really large amount of ObjectId, ObjectIds aren't that heavy, they are 12 bytes and when you divide it by 16MB you get well beyond 1 million.
But in case you still aren't assured, you can benefit by the flexible schema design of Mongo and create one follow-up document to hold further arrays and store the _id of that document in the concerned document with a field named as "followedBy" or something.
downside is you will have to execute a follow-up query (or maybe not).
Hope that helps.
No such limit mentioned in documents of monogdb for it but a javascript array can have upto 2^32-1 = 4,294,967,295 = 4.29 billion elements.
And mongodb document could have upto 16MB.
Every ObjectID uses 12 bytes, if the limit is 16 MB you could have approximately 1398101 objectsID in a array per document.
Maybe the DBRefs could help you, or you could use a GridFS collection.
If you could avoid the joins that would be the best solution on mongo.
Edited 1st July 2021
Since the document that contains the array can not exceed 16 megabytes, the number of objects in an array is limited.
The other answers explain how to calculate this.
MongoDB Limits and Thresholds
Should I use the allowDiskUse option when returned doc exceed 16MB limit in aggregation?
Or should I alter db structure or codes logic to avoid the limit?
What's the advantage and disadvantage of 'allowDiskUse'?
Thanks for your help.
Hers is the official doc I have seen:
Result Size Restrictions
Changed in version 2.6.
Starting in MongoDB 2.6, the aggregate command can return a cursor or store the results in a collection. When returning a cursor or storing the results in a collection, each document in the result set is subject to the BSON Document Size limit, currently 16 megabytes; if any single document that exceeds the BSON Document Size limit, the command will produce an error. The limit only applies to the returned documents; during the pipeline processing, the documents may exceed this size.
Memory Restrictions¶
Changed in version 2.6.
Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.
https://docs.mongodb.com/manual/core/aggregation-pipeline-limits/
allowDiskUse is unrelated to the 16MB result size limit. That setting controls whether pipeline steps such as $sort or $group can use some temporary disk space if they need more than 100MB of memory. In theory, for an arbitrary pipeline this could be a very large amount of diskspace. Personally it's never been a problem, but that will be down to your data.
If your result is going to be more than 16MB then you need to use the $out pipeline stage to output the data to a collection or use a pipeline API that returns a cursor to results instead of returning all the data inline (for some drivers this is a separate method, for others it is a flag passed to the same method).
I haven't found any information about this.
How many documents can a single collection have in MongoDB before it is necessary to use sharding?
There is no limitation as you can see here:
If you specify a maximum number of documents for a capped collection using the max parameter to create, the limit must be less than 2^32 documents. If you do not specify a maximum number of documents when creating a capped collection, there is no limit on the number of documents.
#M-A.Fernandes has right.
I can only add this information:
Maximum Number of Documents Per Chunk to Migrate
MongoDB cannot move a chunk if the number of documents in the chunk exceeds either 250000 documents or 1.3 times the result of dividing the configured chunk size by the average document size. db.collection.stats() includes the avgObjSize field, which represents the average document size in the collection.
I understand that in MongoDB a BSON document can be no bigger than 16mb. Does this size limit account for embedded documents as well? I plan on having well over 16mb of documents inside the embedding document.
A single MongoDB document cannot be larger than 16 MB and all of a document's embedded documents count toward this limit, so what you're planning won't work.