Mongoose limit/offset and count query - mongodb

Bit of an odd one on query performance... I need to run a query which does a total count of documents, and can also return a result set that can be limited and offset.
So, I have 57 documents in total, and the user wants 10 documents offset by 20.
I can think of 2 ways of doing this, first is query for all 57 documents (returned as an array), then using array.slice return the documents they want. The second option is to run 2 queries, the first one using mongo's native 'count' method, then run a second query using mongo's native $limit and $skip aggregators.
Which do you think would scale better? Doing it all in one query, or running two separate ones?
Edit:
// 1 query
var limit = 10;
var offset = 20;
Animals.find({}, function (err, animals) {
if (err) {
return next(err);
}
res.send({count: animals.length, animals: animals.slice(offset, limit + offset)});
});
// 2 queries
Animals.find({}, {limit:10, skip:20} function (err, animals) {
if (err) {
return next(err);
}
Animals.count({}, function (err, count) {
if (err) {
return next(err);
}
res.send({count: count, animals: animals});
});
});

I suggest you to use 2 queries:
db.collection.count() will return total number of items. This value is stored somewhere in Mongo and it is not calculated.
db.collection.find().skip(20).limit(10) here I assume you could use a sort by some field, so do not forget to add an index on this field. This query will be fast too.
I think that you shouldn't query all items and than perform skip and take, cause later when you have big data you will have problems with data transferring and processing.

Instead of using 2 separate queries, you can use aggregate() in a single query:
Aggregate "$facet" can be fetch more quickly, the Total Count and the Data with skip & limit
db.collection.aggregate([
//{$sort: {...}}
//{$match:{...}}
{$facet:{
"stage1" : [ {"$group": {_id:null, count:{$sum:1}}} ],
"stage2" : [ { "$skip": 0}, {"$limit": 2} ]
}},
{$unwind: "$stage1"},
//output projection
{$project:{
count: "$stage1.count",
data: "$stage2"
}}
]);
output as follows:-
[{
count: 50,
data: [
{...},
{...}
]
}]
Also, have a look at https://docs.mongodb.com/manual/reference/operator/aggregation/facet/

db.collection_name.aggregate([
{ '$match' : { } },
{ '$sort' : { '_id' : -1 } },
{ '$facet' : {
metadata: [ { $count: "total" } ],
data: [ { $skip: 1 }, { $limit: 10 },{ '$project' : {"_id":0} } ] // add projection here wish you re-shape the docs
} }
] )
Instead of using two queries to find the total count and skip the matched record.
$facet is the best and optimized way.
Match the record
Find total_count
skip the record
And also can reshape data according to our needs in the query.

There is a library that will do all of this for you, check out mongoose-paginate-v2

After having to tackle this issue myself, I would like to build upon user854301's answer.
Mongoose ^4.13.8 I was able to use a function called toConstructor() which allowed me to avoid building the query multiple times when filters are applied. I know this function is available in older versions too but you'll have to check the Mongoose docs to confirm this.
The following uses Bluebird promises:
let schema = Query.find({ name: 'bloggs', age: { $gt: 30 } });
// save the query as a 'template'
let query = schema.toConstructor();
return Promise.join(
schema.count().exec(),
query().limit(limit).skip(skip).exec(),
function (total, data) {
return { data: data, total: total }
}
);
Now the count query will return the total records it matched and the data returned will be a subset of the total records.
Please note the () around query() which constructs the query.

You don't have to use two queries or one complicated query with aggregate and such.
You can use one query
example:
const getNames = async (queryParams) => {
const cursor = db.collection.find(queryParams).skip(20).limit(10);
return {
count: await cursor.count(),
data: await cursor.toArray()
}
}
mongo returns a cursor that has predefined functions such as count, which will return the full count of the queried results regardless of skip and limit
So in count property, you will get the full length of the collection and in data, you will get just the chunk with offset of 20 and limit of 10 documents

Thanks Igor Igeto Mitkovski, a best solution is using native connection
document is here: https://docs.mongodb.com/manual/reference/method/cursor.count/#mongodb-method-cursor.count
and mongoose dont support it ( https://github.com/Automattic/mongoose/issues/3283 )
we have to use native connection.
const query = StudentModel.collection.find(
{
age: 13
},
{
projection:{ _id:0 }
}
).sort({ time: -1 })
const count = await query.count()
const records = await query.skip(20)
.limit(10).toArray()

Related

Mongodb Aggregation slow count using facet

I am wanting to use a facet to create a simple query that i can use to get paged data, however i have noticed that if i do this i get really poor performance when compared to running just two seperate queries.
As a quick test i created a collection with 50000 random documents and ran the following test.
var x = new Date();
var a = {
count : db.getCollection("test").find({}).count(),
data: db.getCollection("test").find({}).skip(0).limit(10)
};
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{
"$match" : {
}
},
{
"$facet" : {
"data": [
{
"$skip": 0
},
{
"$limit": 10
}
],
"pageInfo": [
{
"$group": {
"_id": null,
"count": {
"$sum": 1
}
}
}
]
}
}
]
)
var y = new Date();
print('result ' + a);
print(y - x);
The result of this is that two seperate queries one for find the other for count takes around 2 milliseconds vs the aggregation single query taking upwards of 500 milliseconds.
Why is it that the aggregation is so slow?
Update
Even just a count without a facet within an aggregation is slow
var x = new Date();
var a = db.getCollection("test").find({}).count();
var y = new Date();
print('result ' + a);
print(y - x);
var x = new Date();
var a = db.getCollection("test").aggregate(
[
{ "$count" : "count" }
]
)
var y = new Date();
print('result ' + a);
print(y - x);
In the above with my test data set, the aggregation count takes 200ms vs the Count method taking 2ms.
This issue extends into the NodeJs Mongodb Driver where the .Count() method has been deprecated and replaced with a countDocuments() method, under the hood the new countDocuments() method is using an aggregation and not the count method on a find just like my example above it has significantly worse performance to the point at which i will continue using the deprecated method over the newer countDocuments() method.
Of course it is slow. The count() method just returns the cursor size after a query is applied (which does not necessarily require all documents to be read, depending on your query and indices). Furthermore, with an empty query, the query optimizer knows that all documents ought to be returned and basically only has to return length(_id_1).
Aggregations, by definition, do not work that way. Unless there is a match stage actually ruling out a document, each and every document is read from “disk” (MongoDB’s own cache and FS caches aside for the moment) for further processing.
I am running into the same issue, and I just hope that anyone might have a better answer then what was previously posted.
I have a "user" collection with 12 million users in it, using MongoDB 5.0.
My query looks like this:
db.users.aggregate([
{ '$sort': { updated_at: -1 } },
{ '$facet': {
results: [
{ $skip: 0 },
{ $limit: 20 }
],
total: [
{ $count: 'count' }
]
}
}
])
The query takes around 1 minute, so that is not acceptable.
I have an index on "updated_at", that is not the issue.
Also, I have this issue even if I run it directly on MongoShell in Compass. So it is not related to any NodeJs Mongo Driver as was previously suspected.
Can I somehow tell Mongo to use the estimated count here?
Or is there any other way to improve the query?

How to get percentage of boolean value documents in Mongo

If I have a collection of documents in MongoDB that all have a boolean value, is there a simple, single function I can run that will return the percentage of documents for which that value is true?
That is to say, if all of my documents look something like this:
{
_id: ObjectId("Blahblahblah"),
someAttr: true,
}
and only 43% of the total number of collections have a true value for someAttr, is there any way of retrieving that 43% figure with a single request, or do I have to make two requests to the DB, one to determine how many are true and one to determine how many are false?
Not a simple one. But you can get a count of all documents and all documents where the field is true with an aggregation:
db.collection.aggregate([{
{ $group: {
_id: 1,
all_count: { $sum: 1 },
true_count: { $sum: { $cmp: [ "$someAttr", false ] } }
}
}
}]);
Converting these counts into a percentage would be possible, but unnecessarily complicated to do with MongoDB itself. I would recommend you to do it on the application layer.
If you don't mind about the performances you can do it with a single request like this.
function getPercentage(callback)
{
var trues = 0;
var total = 0;
var docs = db.collection.find();
docs.forEach(function(doc) {
total++;
if(doc.someAttr)
trues++;
if(total == docs.length)
callback(trues/total * 100);
});
}

documents with tags in mongodb: getting tag counts

I have a collection1 of documents with tags in MongoDB. The tags are an embedded array of strings:
{
name: 'someObj',
tags: ['tag1', 'tag2', ...]
}
I want to know the count of each tag in the collection. Therefore I have another collection2 with tag counts:
{
{
tag: 'tag1',
score: 2
}
{
tag: 'tag2',
score: 10
}
}
Now I have to keep both in sync. It is rather trivial when inserting to or removing from collection1. However when I update collection1 I do the following:
1.) get the old document
var oldObj = collection1.find({ _id: id });
2.) calculate the difference between old and new tag arrays
var removedTags = $(oldObj.tags).not(obj.tags).get();
var insertedTags = $(obj.tags).not(oldObj.tags).get();
3.) update the old document
collection1.update(
{ _id: id },
{ $set: obj }
);
4.) update the scores of inserted & removed tags
// increment score of each inserted tag
insertedTags.forEach(function(val, idx) {
// $inc will set score = 1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: 1 } },
{ upsert: true }
)
});
// decrement score of each removed tag
removedTags.forEach(function(val, idx) {
// $inc will set score = -1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: -1 } },
{ upsert: true }
)
});
My questions:
A) Is this approach of keeping book of scores separately efficient? Or is there a more efficient one-time query to get the scores from collection1?
B) Even if keeping book separately is the better choice: can that be done in less steps, e.g. letting mongoDB calculate what tags are new / removed?
The solution, as nickmilion correctly states, would be an aggregation. Though I would do it with a nack: we'll save its results in a collection. What will do is to trade real time results for an extreme speed boost.
How I would do it
More often than not, the need for real time results is overestimated. Hence, I'd go with precalculated stats for the tags and renew it every 5 minutes or so. That should be well enough, since most of such calls are requested async by the client and hence some delay in case the calculation has to be made on a specific request is negligible.
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
)
db.tagStats.ensureIndex({lastRun:1}, {sparse:true})
Ok, here is the deal. First, we unwind the tags array, group it by the individual tags and increment the score for each occurrence of the respective tag. Next, we upsert lastRun in the tagStats collection, which we can do since MongoDB is schemaless. Next, we create a sparse index, which only holds values for documents in which the indexed field exists. In case the index already exists, ensureIndex is an extremely cheap query; however, since we are going to use that query in our code, we don't need to create the index manually. With this procedure, the following query
db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
)
becomes a covered query: A query which is answered from the index, which tends to reside in RAM, making this query lightning fast (slightly less than 0.5 msecs median in my tests). So what does this query do? It will return a record when the last run of the aggregation was run more than 5 minutes ( 5*60*1000 = 300000 msecs) ago. Of course, you can adjust this to your needs.
Now, we can wrap it up:
var hasToRun = db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
);
if(hasToRun){
db.tags.aggregate(
{$unwind:"$tags"},
{$group: {_id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
);
db.tagStats.ensureIndex({lastRun:1},{sparse:true});
}
// For all stats
var tagsStats = db.tagStats.find({score:{$exists:true}});
// score for a specific tag
var scoreForTag = db.tagStats.find({score:{$exists:true},_id:"tag1"});
Alternative approach
If real time results really matter and you need the stats for all the tags, simply use the aggregation without saving it to another collection:
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
)
If you only need the results for one specific tag at a time, a real time approach could be to use a special index, create a covered query and simply count the results:
db.tags.ensureIndex({tags:1})
var numberOfOccurences = db.tags.find({tags:"tag1"},{_id:0,tags:1}).count();
answering your questions:
B): you don't have to calculate the dif yourself use $addToSet
A): you can get the counts via aggregation framework with a combination of $unwind and $count

MongoDB - Aggregation Framework (Total Count)

When running a normal "find" query on MongoDB I can get the total result count (regardless of limit) by running "count" on the returned cursor. So, even if I limit to result set to 10 (for example) I can still know that the total number of results was 53 (again, for example).
If I understand it correctly, the aggregation framework, however, doesn't return a cursor but simply the results. And so, if I used the $limit pipeline operator, how can I know the total number of results regardless of said limit?
I guess I could run the aggregation twice (once to count the results via $group, and once with $limit for the actual limited results), but this seems inefficient.
An alternative approach could be to attach the total number of results to the documents (via $group) prior to the $limit operation, but this also seems inefficient as this number will be attached to every document (instead of just returned once for the set).
Am I missing something here? Any ideas? Thanks!
For example, if this is the query:
db.article.aggregate(
{ $group : {
_id : "$author",
posts : { $sum : 1 }
}},
{ $sort : { posts: -1 } },
{ $limit : 5 }
);
How would I know how many results are available (before $limit)? The result isn't a cursor, so I can't just run count on it.
There is a solution using push and slice: https://stackoverflow.com/a/39784851/4752635 (#emaniacs mentions it here as well).
But I prefer using 2 queries. Solution with pushing $$ROOT and using $slice runs into document memory limitation of 16MB for large collections. Also, for large collections two queries together seem to run faster than the one with $$ROOT pushing. You can run them in parallel as well, so you are limited only by the slower of the two queries (probably the one which sorts).
First for filtering and then grouping by ID to get number of filtered elements. Do not filter here, it is unnecessary.
Second query which filters, sorts and paginates.
I have settled with this solution using 2 queries and aggregation framework (note - I use node.js in this example):
var aggregation = [
{
// If you can match fields at the begining, match as many as early as possible.
$match: {...}
},
{
// Projection.
$project: {...}
},
{
// Some things you can match only after projection or grouping, so do it now.
$match: {...}
}
];
// Copy filtering elements from the pipeline - this is the same for both counting number of fileter elements and for pagination queries.
var aggregationPaginated = aggregation.slice(0);
// Count filtered elements.
aggregation.push(
{
$group: {
_id: null,
count: { $sum: 1 }
}
}
);
// Sort in pagination query.
aggregationPaginated.push(
{
$sort: sorting
}
);
// Paginate.
aggregationPaginated.push(
{
$limit: skip + length
},
{
$skip: skip
}
);
// I use mongoose.
// Get total count.
model.count(function(errCount, totalCount) {
// Count filtered.
model.aggregate(aggregation)
.allowDiskUse(true)
.exec(
function(errFind, documents) {
if (errFind) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_counting'
});
}
else {
// Number of filtered elements.
var numFiltered = documents[0].count;
// Filter, sort and pagiante.
model.request.aggregate(aggregationPaginated)
.allowDiskUse(true)
.exec(
function(errFindP, documentsP) {
if (errFindP) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_pagination'
});
}
else {
return res.json({
'success': true,
'recordsTotal': totalCount,
'recordsFiltered': numFiltered,
'response': documentsP
});
}
});
}
});
});
Assaf, there's going to be some enhancements to the aggregation framework in the near future that may allow you to do your calculations in one pass easily, but right now, it is best to perform your calculations by running two queries in parallel: one to aggregate the #posts for your top authors, and another aggregation to calculate the total posts for all authors. Also, note that if all you need to do is a count on documents, using the count function is a very efficient way of performing the calculation. MongoDB caches counts within btree indexes allowing for very quick counts on queries.
If these aggregations turn out to be slow there are a couple of strategies. First off, keep in mind that you want start the query with a $match if applicable to reduce the result set. $matches can also be speed up by indexes. Secondly, you can perform these calculations as pre-aggregations. Instead of possible running these aggregations every time a user accesses some part of your app, have the aggregations run periodically in the background and store the aggregations in a collection that contains pre-aggregated values. This way, your pages can simply query the pre-calculated values from this collection.
$facets aggregation operation can be used for Mongo versions >= 3.4.
This allows to fork at a particular stage of a pipeline in multiple sub-pipelines allowing in this case to build one sub pipeline to count the number of documents and another one for sorting, skipping, limiting.
This allows to avoid making same stages multiple times in multiple requests.
If you don't want to run two queries in parallel (one to aggregate the #posts for your top authors, and another aggregation to calculate the total posts for all authors) you can just remove $limit on pipeline and on results you can use
totalCount = results.length;
results.slice(number of skip,number of skip + number of limit);
ex:
db.article.aggregate([
{ $group : {
_id : "$author",
posts : { $sum : 1 }
}},
{ $sort : { posts: -1 } }
//{$skip : yourSkip}, //--remove this
//{ $limit : yourLimit }, // remove this too
]).exec(function(err, results){
var totalCount = results.length;//--GEt total count here
results.slice(yourSkip,yourSkip+yourLimit);
});
I got the same problem, and solved with $project, $slice and $$ROOT.
db.article.aggregate(
{ $group : {
_id : '$author',
posts : { $sum : 1 },
articles: {$push: '$$ROOT'},
}},
{ $sort : { posts: -1 } },
{ $project: {total: '$posts', articles: {$slice: ['$articles', from, to]}},
).toArray(function(err, result){
var articles = result[0].articles;
var total = result[0].total;
});
You need to declare from and to variable.
https://docs.mongodb.com/manual/reference/operator/aggregation/slice/
in my case, we use $out stage to dump result set from aggeration into a temp/cache table, then count it. and, since we need to sort and paginate results, we add index on the temp table and save table name in session, remove the table on session closing/cache timeout.
I get total count with aggregate().toArray().length

Find largest document size in MongoDB

Is it possible to find the largest document size in MongoDB?
db.collection.stats() shows average size, which is not really representative because in my case sizes can differ considerably.
You can use a small shell script to get this value.
Note: this will perform a full table scan, which will be slow on large collections.
let max = 0, id = null;
db.test.find().forEach(doc => {
const size = Object.bsonsize(doc);
if(size > max) {
max = size;
id = doc._id;
}
});
print(id, max);
Note: this will attempt to store the whole result set in memory (from .toArray) . Careful on big data sets. Do not use in production! Abishek's answer has the advantage of working over a cursor instead of across an in memory array.
If you also want the _id, try this. Given a collection called "requests" :
// Creates a sorted list, then takes the max
db.requests.find().toArray().map(function(request) { return {size:Object.bsonsize(request), _id:request._id}; }).sort(function(a, b) { return a.size-b.size; }).pop();
// { "size" : 3333, "_id" : "someUniqueIdHere" }
Starting Mongo 4.4, the new aggregation operator $bsonSize returns the size in bytes of a given document when encoded as BSON.
Thus, in order to find the bson size of the document whose size is the biggest:
// { "_id" : ObjectId("5e6abb2893c609b43d95a985"), "a" : 1, "b" : "hello" }
// { "_id" : ObjectId("5e6abb2893c609b43d95a986"), "c" : 1000, "a" : "world" }
// { "_id" : ObjectId("5e6abb2893c609b43d95a987"), "d" : 2 }
db.collection.aggregate([
{ $group: {
_id: null,
max: { $max: { $bsonSize: "$$ROOT" } }
}}
])
// { "_id" : null, "max" : 46 }
This:
$groups all items together
$projects the $max of documents' $bsonSize
$$ROOT represents the current document for which we get the bsonsize
Finding the largest documents in a MongoDB collection can be ~100x faster than the other answers using the aggregation framework and a tiny bit of knowledge about the documents in the collection. Also, you'll get the results in seconds, vs. minutes with the other approaches (forEach, or worse, getting all documents to the client).
You need to know which field(s) in your document might be the largest ones - which you almost always will know. There are only two practical1 MongoDB types that can have variable sizes:
arrays
strings
The aggregation framework can calculate the length of each. Note that you won't get the size in bytes for arrays, but the length in elements. However, what matters more typically is which the outlier documents are, not exactly how many bytes they take.
Here's how it's done for arrays. As an example, let's say we have a collections of users in a social network and we suspect the array friends.ids might be very large (in practice you should probably keep a separate field like friendsCount in sync with the array, but for the sake of example, we'll assume that's not available):
db.users.aggregate([
{ $match: {
'friends.ids': { $exists: true }
}},
{ $project: {
sizeLargestField: { $size: '$friends.ids' }
}},
{ $sort: {
sizeLargestField: -1
}},
])
The key is to use the $size aggregation pipeline operator. It only works on arrays though, so what about text fields? We can use the $strLenBytes operator. Let's say we suspect the bio field might also be very large:
db.users.aggregate([
{ $match: {
bio: { $exists: true }
}},
{ $project: {
sizeLargestField: { $strLenBytes: '$bio' }
}},
{ $sort: {
sizeLargestField: -1
}},
])
You can also combine $size and $strLenBytes using $sum to calculate the size of multiple fields. In the vast majority of cases, 20% of the fields will take up 80% of the size (if not 10/90 or even 1/99), and large fields must be either strings or arrays.
1 Technically, the rarely used binData type can also have variable size.
Well.. this is an old question.. but - I thought to share my cent about it
My approach - use Mongo mapReduce function
First - let's get the size for each document
db.myColection.mapReduce
(
function() { emit(this._id, Object.bsonsize(this)) }, // map the result to be an id / size pair for each document
function(key, val) { return val }, // val = document size value (single value for each document)
{
query: {}, // query all documents
out: { inline: 1 } // just return result (don't create a new collection for it)
}
)
This will return all documents sizes although it worth mentioning that saving it as a collection is a better approach (the result is an array of results inside the result field)
Second - let's get the max size of document by manipulating this query
db.metadata.mapReduce
(
function() { emit(0, Object.bsonsize(this))}, // mapping a fake id (0) and use the document size as value
function(key, vals) { return Math.max.apply(Math, vals) }, // use Math.max function to get max value from vals (each val = document size)
{ query: {}, out: { inline: 1 } } // same as first example
)
Which will provide you a single result with value equals to the max document size
In short:
you may want to use the first example and save its output as a collection (change out option to the name of collection you want) and applying further aggregations on it (max size, min size, etc.)
-OR-
you may want to use a single query (the second option) for getting a single stat (min, max, avg, etc.)
If you're working with a huge collection, loading it all at once into memory will not work, since you'll need more RAM than the size of the entire collection for that to work.
Instead, you can process the entire collection in batches using the following package I created:
https://www.npmjs.com/package/mongodb-largest-documents
All you have to do is provide the MongoDB connection string and collection name. The script will output the top X largest documents when it finishes traversing the entire collection in batches.
Inspired by Elad Nana's package, but usable in a MongoDB console :
function biggest(collection, limit=100, sort_delta=100) {
var documents = [];
cursor = collection.find().readPref("nearest");
while (cursor.hasNext()) {
var doc = cursor.next();
var size = Object.bsonsize(doc);
if (documents.length < limit || size > documents[limit-1].size) {
documents.push({ id: doc._id.toString(), size: size });
}
if (documents.length > (limit + sort_delta) || !cursor.hasNext()) {
documents.sort(function (first, second) {
return second.size - first.size;
});
documents = documents.slice(0, limit);
}
}
return documents;
}; biggest(db.collection)
Uses cursor
Gives a list of the limit biggest documents, not just the biggest
Sort & cut output list to limit every sort_delta
Use nearest as read preference (you might also want to use rs.slaveOk() on the connection to be able to list collections if you're on a slave node)
As Xavier Guihot already mentioned, a new $bsonSize aggregation operator was introduced in Mongo 4.4, which can give you the size of the object in bytes. In addition to that just wanted to provide my own example and some stats.
Usage example:
// I had an `orders` collection in the following format
[
{
"uuid": "64178854-8c0f-4791-9e9f-8d6767849bda",
"status": "new",
...
},
{
"uuid": "5145d7f1-e54c-44d9-8c10-ca3ce6f472d6",
"status": "complete",
...
},
...
];
// and I've run the following query to get documents' size
db.getCollection("orders").aggregate(
[
{
$match: { status: "complete" } // pre-filtered only completed orders
},
{
$project: {
uuid: 1,
size: { $bsonSize: "$$ROOT" } // added object size
}
},
{
$sort: { size: -1 }
},
],
{ allowDiskUse: true } // required as I had huge amount of data
);
as a result, I received a list of documents by size in descending order.
Stats:
For the collection of ~3M records and ~70GB size in total, the query above took ~6.5 minutes.