I have two collections both having more than 15 million entries. I am doing indexing on both the collections and doing find() on both . I am getting the following error
""10334:BSONObj size: 27624158 (0xDE82A501) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: ObjectId('532d4a424a33b081be8a0315')".
How can I resolve this error ? I hv already done database repair but it didnt work.
Related
I'm scanning a mongodb collection which has large docs containing bson greater than 16 MB in size.
Essentially, I'm calling either of the 2 depending on the flag for random sampling:
documents = collection.aggregate(
[{"$sample": {"size": sample_size}}], allowDiskUse=True)
OR
documents = collection.aggregate(
[{"$limit": sample_size}], allowDiskUse=True)
sample_size is a parameter here.
The issue is that this command gets stuck for minutes over large bson and then eventually mongodb aborts the execution and my scan of the entire collection is not completed.
Is there a way to tell mongodb to skip/ignore documents having size larger than a threshold?
For those who think that MongoDB cannot store values larger than 16 MB, here is the error message by a metadata collector (LinkedIn DataHub):
OperationFailure: BSONObj size: 17375986 (0x10922F2) is invalid.
Size must be between 0 and 16793600(16MB) First element: _id: "Topic XYZ",
full error: {'operationTime': Timestamp(1634531126, 2), 'ok': 0.0, 'errmsg': 'BSONObj size: 17375986 (0x10922F2) is invalid. Size must be between 0 and 16793600(16MB)
Document max size is 16 MB see
(Exception is the GridFS specification)
In your collection each document is already < 16MB, MongoDB does'nt allow us to store bigger documents.
If you want to filter lets say <10 MB
You can use the "$bsonSize" operator to get the size of a document and filter out the big ones.
I want to get a large number (1 million) of documents by their object id, stored in a list obj_ids. So I use
docs = collection.find({'_id' : {'$in' : obj_ids}})
However, when trying to access the documents (e.g. list(docs)) I get
pymongo.errors.DocumentTooLarge: BSON document too large (19889042 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
which confuses me. As I understand this, the document size is 16 MB for a single document. But I don't think I have any document exceeding this limit:
I did not get this error message when inserting any of the documents in the first place.
This error does not show up if I chunk the ObjectIds into 2 subsets, and recombine the results later.
So if there is not some too big document in my collection, what is the error message about?
Your query {'_id' : {'$in' : obj_ids}} is the issue, that's too large, not the documents themselves.
You'll need to refactor your approach; maybe do it in batches and join the results.
I'm reaching some sort of RAM limit when doing this query, here's the error:
The operation: #<Moped::Protocol::Query
#length=100
#request_id=962
#response_to=0
#op_code=2004
#flags=[]
#full_collection_name="test_db.cases"
#skip=1650
#limit=150
#selector={"$query"=>{}, "$orderby"=>{"created_at"=>1}}
#fields=nil>
failed with error 17144: "Runner error: Overflow sort stage buffered data usage of 33555783 bytes exceeds internal limit of 33554432 bytes"
See https://github.com/mongodb/mongo/blob/master/docs/errors.md
for details about this error.
There are two solutions I can think of:
1) up the buffer limit. this requires mongo 2.8 which is some unstable release that i'd have to install manually.
2) break apart the query? chunk it? this is what the query looks like:
upload_set = Case.all.order_by(:created_at.asc).skip(#set_skipper).limit(150).each_slice(5).to_a
#set_skipper grows by 150 every time the method is called.
Any help?
From http://docs.mongodb.org/manual/reference/limits/
Sorted Documents
MongoDB will only return sorted results on fields without an index if
the combined size of all documents in the sort operation, plus a small
overhead, is less than 32 megabytes.
Did you try using an index on created_at ? That should remove that limitation.
I have run into a problem where I have exceeded the allowed BSON size of 16MB and I am getting this error now whenever I try to do something on my collection.
Now my question is, how do I repair and solve the problem?
How do I check whether it is an individual document within my collection, or the collection itself exceeding the limit
How do I remove the offending document? I just keep getting this error whenever I try doing something with this collection now.
I have already tried db.repairDatabase(), but just keep getting the same error:
"errmsg" : "exception: BSONObj size: 1718558820 (0x666F2064) is invalid. Size must be between 0 and 16793600(16MB) First element: ...: ?type=32",
"code" : 10334,
"ok" : 0
Look at the size. It's obviously not a size, it's four ASCII characters. Go and find your bug.
When I'm trying to fetch file stored in MongoDB by GridFS (300mb) I'm getting error:
2014-07-16T22:50:10.201+0200 [conn1139] assertion 17144 Runner error:
Overflow sort stage buffered data usage of 33563462 bytes exceeds internal limit of 33554432 bytes ns:myproject.export_data.chunks query:{ $query: { files_id: ObjectId('53c6e5485f00005f00c6bae6'), n: { $gte: 0, $lte: 1220 } }, $orderby: { n: 1 } }
I found something similar, but it's already fixed:
https://jira.mongodb.org/browse/SERVER-13611
I'm using MongoDB 2.6.3
Not sure which driver or driver version you are using, but it is clear that your implementation is issuing a "sort" and without an index you are blowing up the 32MB memory sort limit when pulling in chunks over a range.
Better driver implementations do not do this and rather "cycle" the chunks with individual queries. But the problem here is you collection is missing the index it needs, either by your own setup or by the driver implementation that created this collection.
It seems you have named your "root" space "export_data", so switch to the database containing the GridFS collections an issue the following:
db.export_data.chunks.ensureIndex( { files_id: 1, n: 1 }, { unique: true } )
Or add something in your application code that does this to ensure the index exists.
This is not a bug. It's clearly about sort as described in the error message, not about GridFS. Read this section about sort limitation:
MongoDB will only return sorted results on fields without an index if the sort operation uses less than 32 megabytes of memory.
Which means your sort aborts if it uses more than 32MB memory without index.
It will be better if you can post the statements you are executing.