MongoDB Aggregation using nested element - mongodb

I have a collection with documents like this:
"_id" : "15",
"name" : "empty",
"location" : "5th Ave",
"owner" : "machine",
"visitors" : [
{
"type" : "M",
"color" : "blue",
"owner" : "Steve Cooper"
},
{
"type" : "K",
"color" : "red",
"owner" : "Luis Martinez"
},
// A lot more of these
]
}
I want to group by visitors.owner to find which owner has the most visits, I tried this:
db.mycol.aggregate(
[
{$group: {
_id: {owner: "$visitors.owner"},
visits: {$addToSet: "$visits"},
count: {$sum: "comments"}
}},
{$sort: {count: -1}},
{$limit: 1}
]
)
But I always get count = 0 and visits not corresponding to one owner :/
Please help

Try the following aggregation pipeline:
db.mycol.aggregate([
{
"$unwind": "$visitors"
},
{
"$group": {
"_id": "$visitors.owner",
"count": { "$sum": 1}
}
},
{
"$project": {
"_id": 0,
"owner": "$_id",
"visits": "$count"
}
}
]);
Using the sample document you provided in your question, the result is:
/* 0 */
{
"result" : [
{
"owner" : "Luis Martinez",
"visits" : 1
},
{
"owner" : "Steve Cooper",
"visits" : 1
}
],
"ok" : 1
}

Related

Filter by nested arrays/objects values (on different levels) and $push by multiple level - MongoDB Aggregate

I have a document with multiple level of embedded subdocument each has some nested array. Using $unwind and sort, do sorting based on day in descending and using push to combine each row records into single array. This Push is working only at one level means it allows only one push. If want to do the same things on the nested level and retains the top level data, got "errmsg" : "Unrecognized expression '$push'".
{
"_id" : ObjectId("5f5638d0ff25e01482432803"),
"name" : "XXXX",
"mobileNo" : 323232323,
"payroll" : [
{
"_id" : ObjectId("5f5638d0ff25e01482432801"),
"month" : "Jan",
"salary" : 18200,
"payrollDetails" : [
{
"day" : "1",
"salary" : 200,
},
{
"day" : "2",
"salary" : 201,
}
]
},
{
"_id" : ObjectId("5f5638d0ff25e01482432802"),
"month" : "Feb",
"salary" : 8300,
"payrollDetails" : [
{
"day" : "1",
"salary" : 300,
},
{
"day" : "2",
"salary" : 400,
}
]
}
],
}
Expected Result:
{
"_id" : ObjectId("5f5638d0ff25e01482432803"),
"name" : "XXXX",
"mobileNo" : 323232323,
"payroll" : [
{
"_id" : ObjectId("5f5638d0ff25e01482432801"),
"month" : "Jan",
"salary" : 18200,
"payrollDetails" : [
{
"day" : "2",
"salary" : 201
},
{
"day" : "1",
"salary" : 200
}
]
},
{
"_id" : ObjectId("5f5638d0ff25e01482432802"),
"month" : "Feb",
"salary" : 8300,
"payrollDetails" : [
{
"day" : "2",
"salary" : 400
},
{
"day" : "1",
"salary" : 300
}
]
}
],
}
Just day will be sorted and remaining things are same
I have tried but it got unrecognized expression '$push'
db.employee.aggregate([
{$unwind: '$payroll'},
{$unwind: '$payroll.payrollDetails'},
{$sort: {'payroll.payrollDetails.day': -1}},
{$group: {_id: '$_id', payroll: {$push: {payrollDetails:{$push:
'$payroll.payrollDetails'} }}}}])
It requires two time $group, you can't use $push operator two times in a field,
$group by main id and payroll id, construct payrollDetails array
$sort by payroll id (you can skip if not required)
$group by main id and construct payroll array
db.employee.aggregate([
{ $unwind: "$payroll" },
{ $unwind: "$payroll.payrollDetails" },
{ $sort: { "payroll.payrollDetails.day": -1 } },
{
$group: {
_id: {
_id: "$_id",
pid: "$payroll._id"
},
name: { $first: "$name" },
mobileNo: { $first: "$mobileNo" },
payrollDetails: { $push: "$payroll.payrollDetails" },
month: { $first: "$payroll.month" },
salary: { $first: "$payroll.salary" }
}
},
{ $sort: { "payroll._id": -1 } },
{
$group: {
_id: "$_id._id",
name: { $first: "$name" },
mobileNo: { $first: "$mobileNo" },
payroll: {
$push: {
_id: "$_id.pid",
month: "$month",
salary: "$salary",
payrollDetails: "$payrollDetails"
}
}
}
}
])
Playground

Merge Multiple Document from same collection MongoDB

I have a JSON data like this and i wanted to apply aggregation on this data in such a way that i should group by from data:
{
"series": [
{
"id": "1",
"element": "111",
"data": [
{
"timeFrame": {
"from": "2016-01-01T00:00:00Z",
"to": "2016-01-31T23:59:59Z"
},
"value": 1
},
{
"timeFrame": {
"from": "2016-02-01T00:00:00Z",
"to": "2016-02-29T23:59:59Z"
},
"value": 2
}
]
}
]
}
and i have acheived this by the above aggregation:
db.getCollection('col1').aggregate([
{$unwind: "$data"},
{$group :{
element: {$first:"$relatedElement"},
_id : {
day : {$dayOfMonth: "$values.timeFrame.from"},
month:{$month: "$values.timeFrame.from"},
year:{$year: "$values.timeFrame.from"}
},
fromDate : { $first : "$values.timeFrame.from" },
total : {$sum : "$values.value"},
count : {$sum : 1},
}
},
{
$project: {
_id : 0,
element:1,
fromDate : '$fromDate',
avgValue : { $divide: [ "$total", "$count" ] }
}
}])
OutPut:
{
"id" : "1",
"element" : "3",
"fromDate" : ISODate("2017-05-01T00:00:00.000Z"),
"avgValue" : 0.0378787878787879
}
{
"id" : "1",
"element" : "3",
"fromDate" : ISODate("2017-04-30T22:00:00.000Z"),
"avgValue" : 0.416666666666667
}
But, i am getting two document and this i want to merge as a single document like :
{
"id" : "1",
"element" : "3",
"average" : [
{
"fromDate" : ISODate("2017-05-01T00:00:00.000Z"),
"avgValue" : 0.0378787878787879
},
{
"fromDate" : ISODate("2017-04-30T22:00:00.000Z"),
"avgValue" : 0.416666666666667
}
]
}
Can anyone help me on this.
Add following $group at the end of your aggregate pipeline to merge current output documents into single document -
{$group:{
_id:"$_id",
element: {$first: "$element"},
average:{$push:{
"fromDate": "$fromDate",
"avgValue": "$avgValue"
}}
}}

How to get the maximum value of a field for each group with the array of corresponding documents?

I have a collection like
{
"_id" : ObjectId("5738cb363bb56eb8f76c2ba8"),
"records" : [
{
"Name" : "Joe",
"Salary" : 70000,
"Department" : "IT"
}
]
},
{
"_id" : ObjectId("5738cb363bb56eb8f76c2ba9"),
"records" : [
{
"Name" : "Henry",
"Salary" : 80000,
"Department" : "Sales"
},
{
"Name" : "Jake",
"Salary" : 40000,
"Department" : "Sales"
}
]
},
{
"_id" : ObjectId("5738cb363bb56eb8f76c2baa"),
"records" : [
{
"Name" : "Sam",
"Salary" : 90000,
"Department" : "IT"
},
{
"Name" : "Tom",
"Salary" : 50000,
"Department" : "Sales"
}
]
}
I want to have the results with the highest salary by each department
{"Name": "Sam", "Salary": 90000, "Department": "IT"}
{"Name": "Henry", "Salary": 80000, "Department": "Sales"}
I could get the highest salary. But I could not get the corresponding employee names.
db.HR.aggregate([
{ "$unwind": "$records" },
{ "$group":
{
"_id": "$records.Department",
"max_salary": { "$max": "$records.Salary" }
}
}
])
Could somebody help me?
You need to $sort your document after $unwind and use the $first operator in the $group stage. You can also use the $last operator in which case you will need to sort your documents in ascending order
db.HR.aggregate([
{ '$unwind': '$records' },
{ '$sort': { 'records.Salary': -1 } },
{ '$group': {
'_id': '$records.Department',
'Name': { '$first': '$records.Name' } ,
'Salary': { '$first': '$records.Salary' }
}}
])
which produces:
{ "_id" : "Sales", "Name" : "Henry", "Salary" : 80000 }
{ "_id" : "IT", "Name" : "Sam", "Salary" : 90000 }
To return the maximum salary and employees list for each department you need to use the $max in your group stage to return the maximum "Salary" for each group then use $push accumulator operator to return a list of "Name" and "Salary" for all employees for each group. From there you need to use the $map operator in your $project stage to return a list of names alongside the maximum salary. Of course the $cond here is used to compare each employee salary to the maximum value. The $setDifference does his work which is filter out all false and is fine as long as the data being filtered is "unique". In this case it "should" be fine, but if any two results contained the same "name" then it would skew results by considering the two to be one.
db.HR.aggregate([
{ '$unwind': '$records' },
{ '$group': {
'_id': '$records.Department',
'maxSalary': { '$max': '$records.Salary' },
'persons': {
'$push': {
'Name': '$records.Name',
'Salary': '$records.Salary'
}
}
}},
{ '$project': {
'maxSalary': 1,
'persons': {
'$setDifference': [
{ '$map': {
'input': '$persons',
'as': 'person',
'in': {
'$cond': [
{ '$eq': [ '$$person.Salary', '$maxSalary' ] },
'$$person.Name',
false
]
}
}},
[false]
]
}
}}
])
which yields:
{ "_id" : "Sales", "maxSalary" : 80000, "persons" : [ "Henry" ] }
{ "_id" : "IT", "maxSalary" : 90000, "persons" : [ "Sam" ] }
Its not the most intuitive thing, but instead of $max you should be using $sort and $first:
{ "$unwind": "$records" },
{ "$sort": { "$records.Salary": -1},
{ "$group" :
{
"_id": "$records.Department",
"max_salary": { "$first": "$records.Salary" },
"name": {$first: "$records.Name"}
}
}
Alternatively, I think this is doable using the $$ROOT operator (fair warning: I've not actually tried this) -
{ "$unwind": "$records" },
{ "$group":
{
"_id": "$records.Department",
"max_salary": { "$max": "$records.Salary" }
"name" : "$$ROOT.records.Name"
}
}
}
Another possible solution:
db.HR.aggregate([
{"$unwind": "$records"},
{"$group":{
"_id": "$records.Department",
"arr": {"$push": {"Name":"$records.Name", "Salary":"$records.Salary"}},
"maxSalary": {"$max":"$records.Salary"}
}},
{"$unwind": "$arr"},
{"$project": {
"_id":1,
"arr":1,
"isMax":{"$eq":["$arr.Salary", "$maxSalary"]}
}},
{"$match":{
"isMax":true
}}
])
This solution takes advantage of the $eq operator to compare two fields in the $project stage.
Test case:
db.HR.insert({"records": [{"Name": "Joe", "Salary": 70000, "Department": "IT"}]})
db.HR.insert({"records": [{"Name": "Henry", "Salary": 80000, "Department": "Sales"}, {"Name": "Jake", "Salary": 40000, "Department": "Sales"}, {"Name": "Santa", "Salary": 90000, "Department": "IT"}]})
db.HR.insert({"records": [{"Name": "Sam", "Salary": 90000, "Department": "IT"}, {"Name": "Tom", "Salary": 50000, "Department": "Sales"}]})
Result:
{ "_id" : "Sales", "arr" : { "Name" : "Henry", "Salary" : 80000 }, "isMax" : true }
{ "_id" : "IT", "arr" : { "Name" : "Santa", "Salary" : 90000 }, "isMax" : true }
{ "_id" : "IT", "arr" : { "Name" : "Sam", "Salary" : 90000 }, "isMax" : true }

How to query a mongo collection to return the full document with virtual fields containing calculated values from the sub-document?

I'm trying to query a collection for a specific document that contains a sub-document. The sub-document contains values for which I'd like to obtain
the highest and lowest scores from that sub-document and return that result as virtual fields to the original document.
I have the following dataset:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU"
}
In mongo 2.4, how can I query mongo once to return the following result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
},
{
"name" : "vallum",
"score" : 100
},
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
Where "best_test" and "worst_test" are virtual fields representing the tests with the highest and lowest scores, respectively.
I've tried with many different ways and the closest I've gotten is with this query:
db.students.aggregate([
{ $match: {
'_id': 'd0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e'
}},
{ $unwind: '$tests' },
{ $sort: {'tests.score': 1} },
{ $group: {
_id: '$_id',
student_tests: {$push: "$$ROOT"},
worst_test: {$first: '$tests'},
best_test: { $last: '$tests' }
}}
]);
Which yields this result:
{
"_id" : "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e",
"student_tests" : [
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "ipsum",
"score" : 65
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "lorem",
"score" : 79
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
{
"name" : "Addison Hunt",
"tests" : [
{
"name" : "vallum",
"score" : 100
}
],
"created_at" : 1401488865684,
"class" : "dolor sit amit",
"user_id" : "005G5635231325O4VIAU",
},
],
"worst_test": {
"name" : "ipsum",
"score" : 65
},
"best_test": {
"name" : "vallum",
"score" : 100
}
}
If you are using $$ROOT then in fact you are using MongoDB 2.6 as this is an aggregation variable only introduced in that version.
But while handy for various things, all it does is represent the entire document at the present stage of the pipeline where used. To do what you want and return the original document unmodified but with additional fields, you could use it in $project stage before the $unwind to assign to the _id field, but really you don't have exactly the same document as you would still need to $project at the end in order to get the correct document shape out of those elements.
You best bet is just projecting the fields, but keeping an un-altered copy of the array before any $sort is applied:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"name": 1,
"tests": 1,
"created_at": 1,
"class": 1,
"user_id": 1,
"testCopy": "$tests"
}},
{ "$unwind": "$testCopy" },
{ "$sort": { "testCopy.score": 1 } },
{ "$group": {
"_id: "$_id",
"tests": { "$first": "$tests" },
"created_at": { "$first": "$created_at" },
"class": { "$first": "$class" },
"user_id": { "$first": "$user_id" },
"worst_test": { "$first": "$testCopy" },
"best_test": { "$last": "$testCopy" }
}}
]);
Or using $$ROOT as mentioned before, alternately just placing the fields under the _id individually in the $project:
db.students.aggregate([
{ "$match": {
"_id": "d0e78492342f9f-f843ec7-4bd14g3h-bh34j3a9-02d6ah32k8e6b79e"
}},
{ "$project": {
"_id": "$$ROOT",
"tests": 1
}},
{ "$unwind": "$tests" },
{ "$sort": { "tests.score": 1 } },
{ "$group": {
"_id": "$_id",
"aworst_test": { "$first": "$tests" },
"abest_test": { "$last": "$tests" }
}},
{ "$project": {
"_id": "$_id._id",
"tests": "$_id.tests",
"created_at": "$_id.created_at",
"class": "$_id.class",
"user_id": "$_id.user_id",
"worst_test": "$aworst_test",
"best_test": "$abest_test"
}}
]);
But as you see, you are still doing the $project work somewhere in order to get the structure you want, as well as the "renamed fields" to maintain the field order you want as the $project will otherwise "optimize" and "keep" any fields that have not been renamed and "append" new fields after the existing ones.
There really is no simple way to "get all fields" in the same way as you originally found them. Operations like $project and $group are an "all or nothing" affair, where they only explicitly produce what you tell them to.

MongoDB aggregate group array to key : sum value

Hello I am new to mongodb and trying to convert objects with different types (int) into key value pairs.
I have collection like this:
{
"_id" : ObjectId("5372a9fc0079285635db14d8"),
"type" : 1,
"stat" : "foobar"
},
{
"_id" : ObjectId("5372aa000079285635db14d9"),
"type" : 1,
"stat" : "foobar"
},
{
"_id" : ObjectId("5372aa010079285635db14da"),
"type" : 2,
"stat" : "foobar"
},{
"_id" : ObjectId("5372aa030079285635db14db"),
"type" : 3,
"stat" : "foobar"
}
I want to get result like this:
{
"type1" : 2, "type2" : 1, "type3" : 1,
"stat" : "foobar"
}
Currently trying aggregation group and then push type values to array
db.types.aggregate(
{$group : {
_id : "$stat",
types : {$push : "$type"}
}}
)
But don't know how to sum different types and to convert it into key values
/* 0 */
{
"result" : [
{
"_id" : "foobar",
"types" : [
1,
2,
2,
3
]
}
],
"ok" : 1
}
For your actual form, and therefore presuming that you actually know the possible values for "type" then you can do this with two $group stages and some use of the $cond operator:
db.types.aggregate([
{ "$group": {
"_id": {
"stat": "$stat",
"type": "$type"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.stat",
"type1": { "$sum": { "$cond": [
{ "$eq": [ "$_id.type", 1 ] },
"$count",
0
]}},
"type2": { "$sum": { "$cond": [
{ "$eq": [ "$_id.type", 2 ] },
"$count",
0
]}},
"type3": { "$sum": { "$cond": [
{ "$eq": [ "$_id.type", 3 ] },
"$count",
0
]}}
}}
])
Which gives exactly:
{ "_id" : "foobar", "type1" : 2, "type2" : 1, "type3" : 1 }
I actually prefer the more dynamic form with two $group stages though:
db.types.aggregate([
{ "$group": {
"_id": {
"stat": "$stat",
"type": "$type"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.stat",
"types": { "$push": {
"type": "$_id.type",
"count": "$count"
}}
}}
])
Not the same output but functional and flexible to the values:
{
"_id" : "foobar",
"types" : [
{
"type" : 3,
"count" : 1
},
{
"type" : 2,
"count" : 1
},
{
"type" : 1,
"count" : 2
}
]
}
Otherwise if you need the same output format but need the flexible fields then you can always use mapReduce, but it's not exactly the same output.
db.types.mapReduce(
function () {
var obj = { };
var key = "type" + this.type;
obj[key] = 1;
emit( this.stat, obj );
},
function (key,values) {
var obj = {};
values.forEach(function(value) {
for ( var k in value ) {
if ( !obj.hasOwnProperty(k) )
obj[k] = 0;
obj[k]++;
}
});
return obj;
},
{ "out": { "inline": 1 } }
)
And in typical mapReduce style:
"results" : [
{
"_id" : "foobar",
"value" : {
"type1" : 2,
"type2" : 1,
"type3" : 1
}
}
],
But those are your options
Is this close enough for you?
{ "_id" : "foobar", "types" : [ { "type" : "type3", "total" : 1 }, { "type" : "type2", "total" : 1 }, { "type" : "type1", "total" : 2 } ] }
The types are in an array, but it seems to get you the data you are looking for. Code is:
db.types.aggregate(
[{$group : {
_id : "$stat",
types : {$push : "$type"}
}},
{$unwind:"$types"},
{$group: {
_id:{stat:"$_id",
types: {$substr: ["$types", 0, 1]}},
total:{$sum:1}}},
{$project: {
_id:0,
stat:"$_id.stat",
type: { $concat: [ "type", "$_id.types" ] },
total:"$total" }},
{$group: {
_id: "$stat",
types: { $push: { type: "$type", total: "$total" } } }}
]
)