Limiting results in MongoDB but still getting the full count? - mongodb

For speed, I'd like to limit a query to 10 results
db.collection.find( ... ).limit(10)
However, I'd also like to know the total count, so to say "there were 124 but I only have 10". Is there a good efficient way to do this?

By default, count() ignores limit() and counts the results in the entire query.
So when you for example do this, var a = db.collection.find(...).limit(10);
running a.count() will give you the total count of your query.

Doing count(1) includes limit and skip.

The accepted answer by #johnnycrab is for the mongo CLI.
If you have to write the same code in Node.js and Express.js, you will have to use it like this to be able to use the "count" function along with the toArray's "result".
var curFind = db.collection('tasks').find({query});
Then you can run two functions after it like this (one nested in the other)
curFind.count(function (e, count) {
// Use count here
curFind.skip(0).limit(10).toArray(function(err, result) {
// Use result here and count here
});
});

cursor.count() should ignore cursor.skip() and cursor.limit() by default.
Source: http://docs.mongodb.org/manual/reference/method/cursor.count/#cursor.count

You can use a $facet stage which processes multiple aggregation pipelines within a single stage on the same set of input documents:
// { item: "a" }
// { item: "b" }
// { item: "c" }
db.collection.aggregate([
{ $facet: {
limit: [{ $limit: 2 }],
total: [{ $count: "count" }]
}},
{ $set: { total: { $first: "$total.count" } } }
])
// { limit: [{ item: "a" }, { item: "b" }], total: 3 }
This way, within the same query, you can get both some documents (limit: [{ $limit: 2 }]) and the total count of documents ({ $count: "count" }).
The final $set stage is an optional clean-up step, just there to project the result of the $count stage, such that "total" : [ { "count" : 3 } ] becomes total: 3.

There is a solution using push and slice: https://stackoverflow.com/a/39784851/4752635
I prefe
First for filtering and then grouping by ID to get number of filtered elements. Do not filter here, it is unnecessary.
Second query which filters, sorts and paginates.
Solution with pushing $$ROOT and using $slice runs into document memory limitation of 16MB for large collections. Also, for large collections two queries together seem to run faster than the one with $$ROOT pushing. You can run them in parallel as well, so you are limited only by the slower of the two queries (probably the one which sorts).
I have settled with this solution using 2 queries and aggregation framework (note - I use node.js in this example, but idea is the same):
var aggregation = [
{
// If you can match fields at the begining, match as many as early as possible.
$match: {...}
},
{
// Projection.
$project: {...}
},
{
// Some things you can match only after projection or grouping, so do it now.
$match: {...}
}
];
// Copy filtering elements from the pipeline - this is the same for both counting number of fileter elements and for pagination queries.
var aggregationPaginated = aggregation.slice(0);
// Count filtered elements.
aggregation.push(
{
$group: {
_id: null,
count: { $sum: 1 }
}
}
);
// Sort in pagination query.
aggregationPaginated.push(
{
$sort: sorting
}
);
// Paginate.
aggregationPaginated.push(
{
$limit: skip + length
},
{
$skip: skip
}
);
// I use mongoose.
// Get total count.
model.count(function(errCount, totalCount) {
// Count filtered.
model.aggregate(aggregation)
.allowDiskUse(true)
.exec(
function(errFind, documents) {
if (errFind) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_counting'
});
}
else {
// Number of filtered elements.
var numFiltered = documents[0].count;
// Filter, sort and pagiante.
model.request.aggregate(aggregationPaginated)
.allowDiskUse(true)
.exec(
function(errFindP, documentsP) {
if (errFindP) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_pagination'
});
}
else {
return res.json({
'success': true,
'recordsTotal': totalCount,
'recordsFiltered': numFiltered,
'response': documentsP
});
}
});
}
});
});

Related

Using $sum on a existent field returns a value of 0 [duplicate]

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

Mongo aggregation and MongoError: exception: BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit

I'm trying to aggregate data from my Mongo collection to produce some statistics for FreeCodeCamp by making a large json file of the data to use later.
I'm running into the error in the title. There doesn't seem to be a lot of information about this, and the other posts here on SO don't have an answer. I'm using the latest version of MongoDB and drivers.
I suspect there is probably a better way to run this aggregation, but it runs fine on a subset of my collection. My full collection is ~7GB.
I'm running the script via node aggScript.js > ~/Desktop/output.json
Here is the relevant code:
MongoClient.connect(secrets.db, function(err, database) {
if (err) {
throw err;
}
database.collection('user').aggregate([
{
$match: {
'completedChallenges': {
$exists: true
}
}
},
{
$match: {
'completedChallenges': {
$ne: ''
}
}
},
{
$match: {
'completedChallenges': {
$ne: null
}
}
},
{
$group: {
'_id': 1, 'completedChallenges': {
$addToSet: '$completedChallenges'
}
}
}
], {
allowDiskUse: true
}, function(err, results) {
if (err) { throw err; }
var aggData = results.map(function(camper) {
return _.flatten(camper.completedChallenges.map(function(challenges) {
return challenges.map(function(challenge) {
return {
name: challenge.name,
completedDate: challenge.completedDate,
solution: challenge.solution
};
});
}), true);
});
console.log(JSON.stringify(aggData));
process.exit(0);
});
});
Aggregate returns a single document containing all the result data, which limits how much data can be returned to the maximum BSON document size.
Assuming that you do actually want all this data, there are two options:
Use aggregateCursor instead of aggregate. This returns a cursor rather than a single document, which you can then iterate over
add a $out stage as the last stage of your pipeline. This tells mongodb to write your aggregation data to the specified collection. The aggregate command itself returns no data and you then query that collection as you would any other.
It just means that the result object you are building became too large. This kind of issue should not be impacted by the version. The fix implemented for 2.5.0 only prevents the crash from occurring.
You need to filter ($match) properly to have the data which you need in result. Also group with proper fields. The results are put into buffer of 64MB. So reduce your data. $project only the columns you require in result. Not whole documents.
You can combine your 3 $match objects to single to reduce pipelines.
{
$match: {
'completedChallenges': {
$exists: true,
$ne: null,
$ne: ""
}
}
}
I had this issue and I couldn't debug the problem so I ended up abandoning the aggregation approach. Instead I just iterated through each entry and created a new collection. Here's a stripped down shell script which might help you see what I mean:
db.new_collection.ensureIndex({my_key:1}); //for performance, not a necessity
db.old_collection.find({}).noCursorTimeout().forEach(function(doc) {
db.new_collection.update(
{ my_key: doc.my_key },
{
$push: { stuff: doc.stuff, other_stuff: doc.other_stuff},
$inc: { thing: doc.thing},
},
{ upsert: true }
);
});
I don't imagine that this approach would suit everyone, but hopefully that helps anyone who was in my particular situation.

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

Mongodb aggregation $unwind then count

Here is my problem : in my Mongo database, I have a collection with items like :
{
'id': 1,
'steps': [
{
action: 'start',
info: 'foo'
},
{
action: 'stop',
info: 'bar'
}
]
}
I would like to get the total number of steps 'start'.
I tryed to use the MongoDB aggregation framework : I use $unwind on steps.action and $match on steps.action to match 'start'.
However, I get too much data and reach the aggregation's limit :
exception: aggregation result exceeds maximum document size (16MB). I don't need the data, I just want the count, but I couldn't find how to do it (tryed with $group without success).
Thanks in advance,
If you want the count you can use this
db.test.count({"steps.action":"start"})
but this will not take into account if steps contain multiple steps with action start.
When you also need to count all steps with start then you need to unwind the array, make a match on steps.action and then group the results to count.
db.test.aggregate([{$unwind:"$steps"}, {$match:{"steps.action":"start"}},{ $group: { _id: null, count: { $sum: 1 } } }])
try this
db.collection.aggregate([
{ $unwind : "$steps" },
{$match:{'steps.action':'start'}},
{$group:{_id:null,count:{$sum:1}}}
]).pretty()
In mongodb, aggregation framework, the pipeline stages have maximum of 100MB size restriction,while the result it provide that is either a BSON file or a collection document has a maximum size of 16MB
So you can $match on require condition only and $group it so that only the required result is output that is less than 16MB.
You may not need aggregation for this simple query. See below code.
for (var i = 10000; i >= 0; i--) {
var a = {
'id': 1,
'steps': [
{
action: 'start',
info: 'foo'
},
{
action: 'stop',
info: 'bar'
}
]
}
a.id = i;
var rnd = Math.floor((Math.random() * 3) + 1);
if (rnd == 1)
{
a.steps[0].action = 'none';
}
if (rnd == 2)
{
a.steps.push({ action: 'start', info: 'foo' })
}
db.obj.insert(a);
};
This code creates random number of actions.
If you need only number of documents which contains action: 'start' then below query.
db.obj.count({"steps.action":"start"})
I get following count in my run.
> db.obj.count({"steps.action":"start"})
6756
But if you need number of {action: 'start'} in the documents then aggregation query needed.
You unwind then match
db.obj.aggregate(
[
{ $unwind : "$steps"},
{$match: { "steps.action" : "start" }},
{
$group:
{
_id: null
,count: { $sum: 1 }
}
}
]
)
This outputs:
{ "_id" : null, "count" : 10054 }
if you get your exception again use **allowDiskUse : true** option. See [here][1].
db.obj.aggregate(
[
....
]
,
{
allowDiskUse : true
}
)

How to limit number of updating documents in mongodb

How to implement somethings similar to db.collection.find().limit(10) but while updating documents?
Now I'm using something really crappy like getting documents with db.collection.find().limit() and then updating them.
In general I wanna to return given number of records and change one field in each of them.
Thanks.
You can use:
db.collection.find().limit(NUMBER_OF_ITEMS_YOU_WANT_TO_UPDATE).forEach(
function (e) {
e.fieldToChange = "blah";
....
db.collection.save(e);
}
);
(Credits for forEach code: MongoDB: Updating documents using data from the same document)
What this will do is only change the number of entries you specify. So if you want to add a field called "newField" with value 1 to only half of your entries inside "collection", for example, you can put in
db.collection.find().limit(db.collection.count() / 2).forEach(
function (e) {
e.newField = 1;
db.collection.save(e);
}
);
If you then want to make the other half also have "newField" but with value 2, you can do an update with the condition that newField doesn't exist:
db.collection.update( { newField : { $exists : false } }, { $set : { newField : 2 } }, {multi : true} );
Using forEach to individually update each document is slow. You can update the documents in bulk using
ids = db.collection.find(<condition>).limit(<limit>).map(
function(doc) {
return doc._id;
}
);
db.collection.updateMany({_id: {$in: ids}}, <update>})
The solutions that iterate over all objects then update them individually are very slow.
Retrieving them all then updating simultaneously using $in is more efficient.
ids = People.where(firstname: 'Pablo').limit(10000).only(:_id).to_a.map(&:id)
People.in(_id: ids).update_all(lastname: 'Cantero')
The query is written using Mongoid, but can be easily rewritten in Mongo Shell as well.
Unfortunately the workaround you have is the only way to do it AFAIK. There is a boolean flag multi which will either update all the matches (when true) or update the 1st match (when false).
As the answer states there is still no way to limit the number of documents to update (or delete) to a value > 1. A workaround to use something like:
db.collection.find(<condition>).limit(<limit>).forEach(function(doc){db.collection.update({_id:doc._id},{<your update>})})
If your id is a sequence number and not an ObjectId you can do this in a for loop:
let batchSize= 10;
for (let i = 0; i <= 1000000; i += batchSize) {
db.collection.update({$and :[{"_id": {$lte: i+batchSize}}, {"_id": {$gt: i}}]}),{<your update>})
}
let fetchStandby = await db.model.distinct("key",{});
fetchStandby = fetchStandby.slice(0, no_of_docs_to_be_updated)
let fetch = await db.model.updateMany({
key: { $in: fetchStandby }
}, {
$set:{"qc.status": "pending"}
})
I also recently wanted something like this. I think querying for a long list of _id just to update in an $in is perhaps slow too, so I tried to use an aggregation+merge
while (true) {
const record = db.records.findOne({ isArchived: false }, {_id: 1})
if (!record) {
print("No more records")
break
}
db.records.aggregate([
{ $match: { isArchived: false } },
{ $limit: 100 },
{
$project: {
_id: 1,
isArchived: {
$literal: true
},
updatedAt: {
$literal: new Date()
}
}
},
{
$merge: {
into: "records",
on: "_id",
whenMatched: "merge"
}
}
])
print("Done update")
}
But feel free to comment if this is better or worse that a bulk update with $in.