I'm scanning a mongodb collection which has large docs containing bson greater than 16 MB in size.
Essentially, I'm calling either of the 2 depending on the flag for random sampling:
documents = collection.aggregate(
[{"$sample": {"size": sample_size}}], allowDiskUse=True)
OR
documents = collection.aggregate(
[{"$limit": sample_size}], allowDiskUse=True)
sample_size is a parameter here.
The issue is that this command gets stuck for minutes over large bson and then eventually mongodb aborts the execution and my scan of the entire collection is not completed.
Is there a way to tell mongodb to skip/ignore documents having size larger than a threshold?
For those who think that MongoDB cannot store values larger than 16 MB, here is the error message by a metadata collector (LinkedIn DataHub):
OperationFailure: BSONObj size: 17375986 (0x10922F2) is invalid.
Size must be between 0 and 16793600(16MB) First element: _id: "Topic XYZ",
full error: {'operationTime': Timestamp(1634531126, 2), 'ok': 0.0, 'errmsg': 'BSONObj size: 17375986 (0x10922F2) is invalid. Size must be between 0 and 16793600(16MB)
Document max size is 16 MB see
(Exception is the GridFS specification)
In your collection each document is already < 16MB, MongoDB does'nt allow us to store bigger documents.
If you want to filter lets say <10 MB
You can use the "$bsonSize" operator to get the size of a document and filter out the big ones.
Related
I am using MongoDB Atlas.
I recently found out that the total size of one of my collection is 67.75 mb, but one of its unique index is taking 294 mb, how is this even possible? How can the index size be more than double of the total document size?
Please take a look into the screenshot below
Mongodb Indexes Screenshot
What you see in the picture as 67.75MB is actually the storageSize which is usually the compressed size of your collection (db.myCollection.stats(1024*1024).storageSize ) , to view your actual collection size in MB you can execute:
db.myCollection.stats(1024*1024).size
This will show your collection uncompressed size which is expected to be much more then only 67.75MB ...
I cannot find the answer to this seemingly basic question. The documentation says the max size for a BSON document is 16MB.
I want to know what is the max size of a query result allowed. E.g., if there are 1,000 records in a collection, and each document is 1MB, will mongodb throw an error if there are 400 documents (totaling to 400MB)?
I want to get a large number (1 million) of documents by their object id, stored in a list obj_ids. So I use
docs = collection.find({'_id' : {'$in' : obj_ids}})
However, when trying to access the documents (e.g. list(docs)) I get
pymongo.errors.DocumentTooLarge: BSON document too large (19889042 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
which confuses me. As I understand this, the document size is 16 MB for a single document. But I don't think I have any document exceeding this limit:
I did not get this error message when inserting any of the documents in the first place.
This error does not show up if I chunk the ObjectIds into 2 subsets, and recombine the results later.
So if there is not some too big document in my collection, what is the error message about?
Your query {'_id' : {'$in' : obj_ids}} is the issue, that's too large, not the documents themselves.
You'll need to refactor your approach; maybe do it in batches and join the results.
I am using mongodb for one of my application.
We are fetching large amount of records from the db.
We are facing following issue when we fetch large number of documents from db.
aggregation result exceeds maximum document size
Any option to set this max limit?
The documentation states that:
If you do not specify the cursor option or store the results in a
collection, the aggregate command returns a single BSON document that
contains a field with the result set. As such, the command will
produce an error if the total size of the result set exceeds the BSON
Document Size limit.
Earlier versions of the aggregate command can only return a single
BSON document that contains the result set and will produce an error
if the if the total size of the result set exceeds the BSON Document
Size limit.
The maximum BSON document size is 16 megabytes.
That actually is the problem that you have now.
By default Max size of MongoDB document is 16 MB. However if my collection has an array of documents does the same limit applicable to that also.
For example:-
"Address":[{"name":"value1","pincode":"123456"},{"name":"value1","pincode":"123456"},{"name":"value1","pincode":"123456"}]
Here is it the case that my address collection cannot be more than 16MB?
Address is a collection that contains documents like {"name":"value1","pincode":"123456"}.
Each document in the collection is limited to 16 MB, but the size of the collection is not limited.