Mongodb aggregation $unwind then count - mongodb

Here is my problem : in my Mongo database, I have a collection with items like :
{
'id': 1,
'steps': [
{
action: 'start',
info: 'foo'
},
{
action: 'stop',
info: 'bar'
}
]
}
I would like to get the total number of steps 'start'.
I tryed to use the MongoDB aggregation framework : I use $unwind on steps.action and $match on steps.action to match 'start'.
However, I get too much data and reach the aggregation's limit :
exception: aggregation result exceeds maximum document size (16MB). I don't need the data, I just want the count, but I couldn't find how to do it (tryed with $group without success).
Thanks in advance,

If you want the count you can use this
db.test.count({"steps.action":"start"})
but this will not take into account if steps contain multiple steps with action start.
When you also need to count all steps with start then you need to unwind the array, make a match on steps.action and then group the results to count.
db.test.aggregate([{$unwind:"$steps"}, {$match:{"steps.action":"start"}},{ $group: { _id: null, count: { $sum: 1 } } }])

try this
db.collection.aggregate([
{ $unwind : "$steps" },
{$match:{'steps.action':'start'}},
{$group:{_id:null,count:{$sum:1}}}
]).pretty()

In mongodb, aggregation framework, the pipeline stages have maximum of 100MB size restriction,while the result it provide that is either a BSON file or a collection document has a maximum size of 16MB
So you can $match on require condition only and $group it so that only the required result is output that is less than 16MB.

You may not need aggregation for this simple query. See below code.
for (var i = 10000; i >= 0; i--) {
var a = {
'id': 1,
'steps': [
{
action: 'start',
info: 'foo'
},
{
action: 'stop',
info: 'bar'
}
]
}
a.id = i;
var rnd = Math.floor((Math.random() * 3) + 1);
if (rnd == 1)
{
a.steps[0].action = 'none';
}
if (rnd == 2)
{
a.steps.push({ action: 'start', info: 'foo' })
}
db.obj.insert(a);
};
This code creates random number of actions.
If you need only number of documents which contains action: 'start' then below query.
db.obj.count({"steps.action":"start"})
I get following count in my run.
> db.obj.count({"steps.action":"start"})
6756
But if you need number of {action: 'start'} in the documents then aggregation query needed.
You unwind then match
db.obj.aggregate(
[
{ $unwind : "$steps"},
{$match: { "steps.action" : "start" }},
{
$group:
{
_id: null
,count: { $sum: 1 }
}
}
]
)
This outputs:
{ "_id" : null, "count" : 10054 }
if you get your exception again use **allowDiskUse : true** option. See [here][1].
db.obj.aggregate(
[
....
]
,
{
allowDiskUse : true
}
)

Related

Convert ObjectID to String in mongo Aggregation

I'm in this scenario right now:
I have a collection X:
{
_id:ObjectId('56edbb4d5f084a51131dd4c6'),
userRef:ObjectId('56edbb4d5f084a51131dd4c6'),
serialNumber:'A123123',
...
}
I need to aggregate all documents, grouping them by the userRef + serialNumber, so I'm trying to use concat like this:
$group: {
_id: {
'$concat': ['$userRef','-','$serialNumber']
},
...
So basically in my aggregation in MongoDB, I need to group documents by the concatenation of a ObjectId and a string. However, It seems that $concat only accepts strings as parameters:
uncaught exception: aggregate failed: {
"errmsg" : "exception: $concat only supports strings, not OID",
"code" : 16702,
"ok" : 0
}
Is there a way to convert an ObjectId to a String within an aggregation expression?
EDIT:
This question is related, but I the solution doesn't fit my problem. (Specially because I can't use ObjectId.toString() during the aggregation)
Indeed I couldn't find any ObjectId().toString() operation in Mongo's documentation, but I wonder if there's any tricky thing that can be done in this case.
Now you can try with $toString aggregation which simply
converts ObjectId to string
db.collection.aggregate([
{ "$addFields": {
"userRef": { "$toString": "$userRef" }
}},
{ "$group": {
"_id": { "$concat": ["$userRef", "-", "$serialNumber"] }
}}
])
You can check the output here
I couldn't find a way to do what I wanted, so instead, I created a MapReduce function that, in the end, generated the keys the way I wanted to (concatenating other keys).
At the end, it looked something like this:
db.collection('myCollection').mapReduce(
function() {
emit(
this.userRef.str + '-' + this.serialNumber , {
count: 1,
whateverValue1:this.value1,
whateverValue2:this.value2,
...
}
)
},
function(key, values) {
var reduce = {}
.... my reduce function....
return reduce
}, {
query: {
...filters_here....
},
out: 'name_of_output_collection'
}
);
You can simply use $toString to apply $concat in aggregation on ObjectIDs in the following way -
$group: {
'_id': {
'$concat': [
{ '$toString' : '$userRef' },
'-',
{ '$toString' : '$serialNumber'}
]
},
}
I think you may try to resolve it by using an Array which contains both fields:
{$project:{newkey:['$userRef','$serialNumber']},{$match:{newkey:{$in:filterArray}}}}
this may match the data with both fields to the filter. Please notice that the data in the newkey array should have the same data type with the filterArray elements.
You can use $substr https://docs.mongodb.com/manual/reference/operator/aggregation/substr/#exp._S_substr to cast any object to string before $concat.
This is a sample of code that's working for me.
group_id_i['_id'] = {
'$concat' => [
{ '$substr' => [ {'$year' => '$t'}, 0, -1] }, '-',
{ '$substr' => [ {'$month' => '$t'}, 0, -1] }, '-',
{ '$substr' => [ {'$dayOfMonth' => '$t'}, 0, -1] }
]
}
Where t is DateTime field, this aggregation returns data like so.
{
"_id" => "28-9-2016",
"i" => 2
}

MongoDB - change simple field into an object

In MongoDB, I want to change the structure of my documents from:
{
discount: 10,
discountType: "AMOUNT"
}
to:
{
discount: {
value: 10,
type: "AMOUNT"
}
}
so I tried following query in mongo shell:
db.discounts.update({},
{
$rename: {
discount: "discount.value",
discountType: "discount.type"
}
},
{multi: true}
)
but it throws an error:
"writeError" : {
"code" : 2,
"errmsg" : "The source and target field for $rename must not be on the same path: discount: \"discount.value\""
}
A workaround that comes to my mind is to do it in 2 steps: first assign the new structure to a new field (let's say discount2) and then rename it to discount. But maybe there is a way to do it one step?
The simplest way is to do it in two steps as you allude to in your question; initially renaming discount to a temporary field name so that it can be reused in the second step:
db.discounts.update({}, {$rename: {discount: 'temp'}}, {multi: true})
db.discounts.update({},
{$rename: {temp: 'discount.value', discountType: 'discount.type'}},
{multi: true})
The reason you are getting this error is because as mentioned in the documentation:
The $rename operator logically performs an $unset of both the old name and the new name, and then performs a $set operation with the new name. As such, the operation may not preserve the order of the fields in the document; i.e. the renamed field may move within the document.
And the problem with this is that you can't $set and $unset same field at the same time in MongoDB.
The solution will be to use bulk operations to update your documents in order to change their structure, and even in that case you need to use a field's name that doesn't exist in your collection. Of course the best way to do all this is using "Bulk" operations for maximum efficiency
MongoDB 3.2 or newer
MongoDB 3.2 deprecates Bulk() and its associated methods. You need to use the .bulkWrite() method.
var operations = [];
db.discounts.find().forEach(function(doc) {
var discount = doc.discount;
var discountType = doc.discountType;
var operation = { 'updateOne': {
'filter': { '_id': doc._id },
'update': {
'$unset': { 'discount': '', 'discountType': '' },
'$set': { 'discounts.value': discount, 'discounts.type': discountType }
}
}};
operations.push(operation);
});
operations.push( {
ordered: true,
writeConcern: { w: "majority", wtimeout: 5000 }
});
db.discounts.bulkWrite(operations);
Which yields:
{
"_id" : ObjectId("56682a02e6a2321d88f6d078"),
"discounts" : {
"value" : 10,
"type" : "AMOUNT"
}
}
MongoDB 2.6
Prior to MongoDB 3.2 and using MongoDB version 2.6 or newer you can use the "Bulk" API.
var bulk = db.discounts.initializeOrderedBulkOp();
var count = 0;
db.discounts.find().forEach(function(doc) {
var discount = doc.discount;
var discountType = doc.discountType;
bulk.find( { '_id': doc._id } ).updateOne( {
'$unset': { 'discount': '', 'discountType': '' },
'$set': { 'discounts.value': discount, 'discounts.type': discountType } });
count++;
if (count % 500 === 0) {
bulk.execute();
bulk = db.discounts.initializeOrderedBulkOp();
}
})
if (count > 0)
bulk.execute();
This query yields same result as previous one.
Thanks to answers from Update MongoDB field using value of another field I figured out following solution:
db.discounts.find().snapshot().forEach(
function(elem) {
elem.discount = {
value: elem.discount,
type: elem.discountType
}
delete elem.discountType;
db.discounts.save(elem);
}
)
Which I quite like because the source code reads nicely but performance sucks for large amount of documents.

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

Limiting results in MongoDB but still getting the full count?

For speed, I'd like to limit a query to 10 results
db.collection.find( ... ).limit(10)
However, I'd also like to know the total count, so to say "there were 124 but I only have 10". Is there a good efficient way to do this?
By default, count() ignores limit() and counts the results in the entire query.
So when you for example do this, var a = db.collection.find(...).limit(10);
running a.count() will give you the total count of your query.
Doing count(1) includes limit and skip.
The accepted answer by #johnnycrab is for the mongo CLI.
If you have to write the same code in Node.js and Express.js, you will have to use it like this to be able to use the "count" function along with the toArray's "result".
var curFind = db.collection('tasks').find({query});
Then you can run two functions after it like this (one nested in the other)
curFind.count(function (e, count) {
// Use count here
curFind.skip(0).limit(10).toArray(function(err, result) {
// Use result here and count here
});
});
cursor.count() should ignore cursor.skip() and cursor.limit() by default.
Source: http://docs.mongodb.org/manual/reference/method/cursor.count/#cursor.count
You can use a $facet stage which processes multiple aggregation pipelines within a single stage on the same set of input documents:
// { item: "a" }
// { item: "b" }
// { item: "c" }
db.collection.aggregate([
{ $facet: {
limit: [{ $limit: 2 }],
total: [{ $count: "count" }]
}},
{ $set: { total: { $first: "$total.count" } } }
])
// { limit: [{ item: "a" }, { item: "b" }], total: 3 }
This way, within the same query, you can get both some documents (limit: [{ $limit: 2 }]) and the total count of documents ({ $count: "count" }).
The final $set stage is an optional clean-up step, just there to project the result of the $count stage, such that "total" : [ { "count" : 3 } ] becomes total: 3.
There is a solution using push and slice: https://stackoverflow.com/a/39784851/4752635
I prefe
First for filtering and then grouping by ID to get number of filtered elements. Do not filter here, it is unnecessary.
Second query which filters, sorts and paginates.
Solution with pushing $$ROOT and using $slice runs into document memory limitation of 16MB for large collections. Also, for large collections two queries together seem to run faster than the one with $$ROOT pushing. You can run them in parallel as well, so you are limited only by the slower of the two queries (probably the one which sorts).
I have settled with this solution using 2 queries and aggregation framework (note - I use node.js in this example, but idea is the same):
var aggregation = [
{
// If you can match fields at the begining, match as many as early as possible.
$match: {...}
},
{
// Projection.
$project: {...}
},
{
// Some things you can match only after projection or grouping, so do it now.
$match: {...}
}
];
// Copy filtering elements from the pipeline - this is the same for both counting number of fileter elements and for pagination queries.
var aggregationPaginated = aggregation.slice(0);
// Count filtered elements.
aggregation.push(
{
$group: {
_id: null,
count: { $sum: 1 }
}
}
);
// Sort in pagination query.
aggregationPaginated.push(
{
$sort: sorting
}
);
// Paginate.
aggregationPaginated.push(
{
$limit: skip + length
},
{
$skip: skip
}
);
// I use mongoose.
// Get total count.
model.count(function(errCount, totalCount) {
// Count filtered.
model.aggregate(aggregation)
.allowDiskUse(true)
.exec(
function(errFind, documents) {
if (errFind) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_counting'
});
}
else {
// Number of filtered elements.
var numFiltered = documents[0].count;
// Filter, sort and pagiante.
model.request.aggregate(aggregationPaginated)
.allowDiskUse(true)
.exec(
function(errFindP, documentsP) {
if (errFindP) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_pagination'
});
}
else {
return res.json({
'success': true,
'recordsTotal': totalCount,
'recordsFiltered': numFiltered,
'response': documentsP
});
}
});
}
});
});

How to limit number of updating documents in mongodb

How to implement somethings similar to db.collection.find().limit(10) but while updating documents?
Now I'm using something really crappy like getting documents with db.collection.find().limit() and then updating them.
In general I wanna to return given number of records and change one field in each of them.
Thanks.
You can use:
db.collection.find().limit(NUMBER_OF_ITEMS_YOU_WANT_TO_UPDATE).forEach(
function (e) {
e.fieldToChange = "blah";
....
db.collection.save(e);
}
);
(Credits for forEach code: MongoDB: Updating documents using data from the same document)
What this will do is only change the number of entries you specify. So if you want to add a field called "newField" with value 1 to only half of your entries inside "collection", for example, you can put in
db.collection.find().limit(db.collection.count() / 2).forEach(
function (e) {
e.newField = 1;
db.collection.save(e);
}
);
If you then want to make the other half also have "newField" but with value 2, you can do an update with the condition that newField doesn't exist:
db.collection.update( { newField : { $exists : false } }, { $set : { newField : 2 } }, {multi : true} );
Using forEach to individually update each document is slow. You can update the documents in bulk using
ids = db.collection.find(<condition>).limit(<limit>).map(
function(doc) {
return doc._id;
}
);
db.collection.updateMany({_id: {$in: ids}}, <update>})
The solutions that iterate over all objects then update them individually are very slow.
Retrieving them all then updating simultaneously using $in is more efficient.
ids = People.where(firstname: 'Pablo').limit(10000).only(:_id).to_a.map(&:id)
People.in(_id: ids).update_all(lastname: 'Cantero')
The query is written using Mongoid, but can be easily rewritten in Mongo Shell as well.
Unfortunately the workaround you have is the only way to do it AFAIK. There is a boolean flag multi which will either update all the matches (when true) or update the 1st match (when false).
As the answer states there is still no way to limit the number of documents to update (or delete) to a value > 1. A workaround to use something like:
db.collection.find(<condition>).limit(<limit>).forEach(function(doc){db.collection.update({_id:doc._id},{<your update>})})
If your id is a sequence number and not an ObjectId you can do this in a for loop:
let batchSize= 10;
for (let i = 0; i <= 1000000; i += batchSize) {
db.collection.update({$and :[{"_id": {$lte: i+batchSize}}, {"_id": {$gt: i}}]}),{<your update>})
}
let fetchStandby = await db.model.distinct("key",{});
fetchStandby = fetchStandby.slice(0, no_of_docs_to_be_updated)
let fetch = await db.model.updateMany({
key: { $in: fetchStandby }
}, {
$set:{"qc.status": "pending"}
})
I also recently wanted something like this. I think querying for a long list of _id just to update in an $in is perhaps slow too, so I tried to use an aggregation+merge
while (true) {
const record = db.records.findOne({ isArchived: false }, {_id: 1})
if (!record) {
print("No more records")
break
}
db.records.aggregate([
{ $match: { isArchived: false } },
{ $limit: 100 },
{
$project: {
_id: 1,
isArchived: {
$literal: true
},
updatedAt: {
$literal: new Date()
}
}
},
{
$merge: {
into: "records",
on: "_id",
whenMatched: "merge"
}
}
])
print("Done update")
}
But feel free to comment if this is better or worse that a bulk update with $in.