How to Cap a Collection based on Size - MOngodb

How to Cap a Collection based on Size - MOngodb - mongodb

Im new to MongoDB. Im exploring options on Capped Collection.
I created a capped collection with Size :10 i assume its in Byte.
And i inserted a document of size 52 .(referring from the db.collection.stats() size option)
Shouldn't this document be rejected since its size is greater than 10b?

As the documentation for MongoDB 2.6 says, "If the size field is less than or equal to 4096, then the collection will have a cap of 4096 bytes. Otherwise, MongoDB will raise the provided size to make it an integer multiple of 256." You can see the size MongoDB actually chose by querying system.namespaces:
> // Create collection of size 10.
> db.createCollection('my_collection', {capped: true, size: 10})
{ "ok" : 1 }
> // MongoDB actually sets the size to 4096.
> db.system.namespaces.findOne({name: 'test.my_collection'}).options
{ "capped" : true, "size" : 4096 }

Related

Can somebody please explain the size limit of a document in MongoDB? It says there is a 16MB limit per document but what does it mean?

I have about 3 collections and each don’t even have a lot of data. Each of them have about 50 objects/items that each contain around 200 characters but the whole collection (one collection) is taking up ≈270KB of space (which is a lot). I don’t understand why it is doing that.
So going back to the question, do those collections each have a limit of 16mb or is it the entire database? Please help. Thank you.

Here are some examples:
Object.bsonsize({ a: null }) => 8
Object.bsonsize({}) => 5
Object.bsonsize({ _id: ObjectId() }) => 22
Object.bsonsize({a: "fo"}) => 15
Object.bsonsize({a: "foo"}) => 16
Object.bsonsize({ab: "fo"}) => 16
5 Bytes seems to be smallest possible size.
You can retrieve the BSON size of your documents with this aggregation pipeline:
db.collection.aggregate([{ $project: { size: { $bsonSize: "$$ROOT" } } }])
The max size of a document is 16 MiByte, which is a hard-code limit. Each document has the _id field with typically an ObjectId value, thus the minimum size of a document is 22 Byte.
According to MongoDB: The Definitive Guide the entire text of "War and Peace" is just 3.14MB, so 16MiB is quite a lot.

How to get Collections sorted in descending order of their size in MongoDB

I need the size of all collections in a MongoDB sorted in their descending order of size along with their storage size.
Sample output :
admin.system.users: 2717 (36864)
admin.system.keys: 170 (16384)
admin.system.version: 104 (16384)
Can someone please help me

Voila:
mongos> var s=[];db.getCollectionNames().forEach(function(n){ s.push(db[n].stats()); });s=s.sort(function(a,b){return b['size']-a['size']});for(var c in s){print(s[c]['ns']+" "+s[c]['size']+" ("+s[c]['storageSize']+") " ); }
test.test1 75682763 (11538432)
test.data 736 (32768)
mongos>

How can I create a capped collection by storageSize instead of dataSize?

Running MongoDB 4.0.4 on Debian GNU/Linux 9 (stretch), with a collection capped in size to 2.9TB with unlimited number of documents with the WiredTiger engine.
{
...
"size" : NumberLong(3113851252530), // <= ~2.9T
"count" : 238059628,
"avgObjSize" : 13080,
"storageSize" : 863343902720.0, // <= ~804G
"capped" : true,
"max" : -1,
"maxSize" : NumberLong(3113851291136), // <= 2.9T
...
}
So size reached the cap of 2.9T, I also started overwriting older documents. However I created the collection with storage size in mind, I couldn't care less about size of uncompressed data.
There's plenty of space left on disk (8TB). Also got the exact same behavior in a capped collection of 1M.
Can I cap a collection size by storageSize instead?

mongodb in-memory sort with projection or limit

I'm running a query against MongoDB (3.2 in my case) with a sort, and I'm getting:
OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM.
I understand that I can use an index to avoid this. In my case this is an operation that I run very rarely, so the overhead of an index doesn't make sense (it's also fine if this operation takes a long time and consumes a lot of resources). I'm pretty sure I'll end up using an aggregation with allowDiskUse to work around this, but I was curious about something.
I'm curious whether using a projection can reduce the memory footprint required by the in-memory sort. Similarly I'm wondering whether a limit() can reduce this footprint (since the sort only needs to keep the top/bottom N in memory).

If your collection is not indexed, projection will not be able to help you sidestep the in-memory sort 32 MB limit. On the other hand, limit() can help you if the resulting result set is less than 32 MB in size.
Note This answer concerns only the regular find() method, and not the corresponding $match + $sort + $limit aggregation stages.
Unindexed Find + Projection + Sort
The Interaction with Projection documentation mentions:
When a set of results are both sorted and projected, the MongoDB query engine will always apply the sorting first.
This can be shown using the explain() method. For example, consider an unindexed collection containing documents in the form of:
{"a": <a short string>, "b": <a large 10 MB value>}
the explain() result of a sort with projection outputs:
> db.coll.explain().find({},{a:1}).sort({a:1})
...
"winningPlan": {
"stage": "PROJECTION",
"transformBy": {
"a": 1
},
"inputStage": {
"stage": "SORT",
"sortPattern": {
"a": 1
},
"inputStage": {
"stage": "SORT_KEY_GENERATOR",
"inputStage": {
"stage": "COLLSCAN",
"direction": "forward"
}
}
}
},
...
From the explain() output, the stages of the query goes in the order of:
COLLSCAN -> SORT -> PROJECTION
This means that projection will not be able to help you when your result set size exceeds 32 MB.
Running the query thus resulted in the expected failure:
> db.coll.find({},{a:1}).sort({a:1})
Error: error: {
"ok": 0,
"errmsg": "Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",
"code": 96,
"codeName": "OperationFailed"
}
Unindexed Find + Limit + Sort
The Limit results documentation mentions:
If MongoDB cannot obtain the sort order via an index scan, then MongoDB uses a top-k sort algorithm. This algorithm buffers the first k results (or last, depending on the sort order) seen so far by the underlying index or collection access. If at any point the memory footprint of these k results exceeds 32 megabytes, the query will fail.
The limit() will help in this regard only if the total result set to be sorted is still under 32 MB.
For example (using the 10 MB per document example above), doing a
`find({}, <projection>).limit(3).sort(...)`
will work, since the total size that needs to be sorted is 3x10 MB == 30 MB.
However, doing
find({}, <projection>).limit(4).sort(...)
will fail, since the result set will contain 4x10 MB == 40 MB. The projection does not matter in both cases, only the result set total size to be sorted.
Note that using projection does not affect the memory usage of the query. Only the size of the result set that needs to be sorted matters.

I solved this problem by creating an index for the sort parameters.
Ex:
db.collection.find({ .... }).projection({...}).sort({ code: 1, name: 1 });
db.collection.createIndex({ code: 1, name: 1 }, collation: { locale: 'en'})
This way you avoid exceeding the limit of memory for sort (32mb), but if your case each entry document is big, you can increase this limit.
db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes: 33554432}) // 32mb

Get the size of all the documents in a query

Is there a way to get the size of all the documents that meets a certain query in the MongoDB shell?
I'm creating a tool that will use mongodump (see here) with the query option to dump specific data on an external media device. However, I would like to see if all the documents will fit in the external media device before starting the dump. That's why I would like to get the size of all the documents that meet the query.
I am aware of the Object.bsonsize method described here, but it seems that it only returns the size of one document.

Here's the answer that I've found:
var cursor = db.collection.find(...); //Add your query here.
var size = 0;
cursor.forEach(
function(doc){
size += Object.bsonsize(doc)
}
);
print(size);
Should output the size in bytes of the documents pretty accurately.
I've ran the command twice. The first time, there were 141 215 documents which, once dumped, had a total of about 108 mb. The difference between the output of the command and the size on disk was of 787 bytes.
The second time I ran the command, there were 35 914 179 documents which, once dumped, had a total of about 57.8 gb. This time, I had the exact same size between the command and the real size on disk.

Starting in Mongo 4.4, $bsonSize returns the size in bytes of a given document when encoded as BSON.
Thus, in order to sum the bson size of all documents matching your query:
// { d: [1, 2, 3, 4, 5] }
// { a: 1, b: "hello" }
// { c: 1000, a: "world" }
db.collection.aggregate([
{ $group: {
_id: null,
size: { $sum: { $bsonSize: "$$ROOT" } }
}}
])
// { "_id" : null, "size" : 177 }
This $groups all matching items together and $sums grouped documents' $bsonSize.
$$ROOT represents the current document from which we get the bsonsize.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to Cap a Collection based on Size - MOngodb - mongodb

Im new to MongoDB. Im exploring options on Capped Collection. I created a capped collection with Size :10 i assume its in Byte. And i inserted a document of size 52 .(referring from the db.collection.stats() size option) Shouldn't this document be rejected since its size is greater than 10b?

Related

Can somebody please explain the size limit of a document in MongoDB? It says there is a 16MB limit per document but what does it mean?

How to get Collections sorted in descending order of their size in MongoDB

How can I create a capped collection by storageSize instead of dataSize?

mongodb in-memory sort with projection or limit

Get the size of all the documents in a query

Categories

Resources