Mongo Aggregation grouping subdocument - mongodb

I'm kinda stuck doing something seemingly simple with MongoDB's aggregation framework.
Imagine you have documents that would look like this :
[
{ a: 1, b: 2 },
{ a: 1, b: 3 },
{ a: 5, b: 6 }
]
How can you group documents by the field a and then regroup sub-documents by another field, say b while still calculating the total number of documents at each step ?
For our example, the results would look be the following output document :
{
results: [
{
_id: {
a: 1
},
sum_a: 2,
doc_a: [
{
_id: {
b: 2
},
sum_b: 1
},
{
_id: {
b: 3
},
sum_b: 1
}
]
},
{
_id: {
a: 5
},
sum_a: 1,
doc_a: [
{
_id: {
b: 6
},
sum_b: 1
}
]
}
]
}
I tried things like this :
printjson(db.getSiblingDB('mydb').mycollection.aggregate([
{
$project: {
a: 1,
b: 1
}
},
{
$group: {
_id: {
a: '$a'
},
sum_a: {
$sum: 1
},
b: {
$first: '$b'
}
}
},
{
$group: {
_id: {
b: '$b'
},
sum_b: {
$sum: 1
}
}
},
{
$sort: {
sum_a: 1
}
}
]));
But in the different tests I made, it keeps overwriting previous group stage results, wrongly calculating sums, ...etc.
So I'm not really sure how to approach this problem.

If you group by main field ('a') and sub-field ('b') together and then group by only 'a' (summing the counts from the first step) and push 'b's into an array (copying counts from the first step), it should produce what you need:
{
$group : {
_id : {
a : '$a',
b : '$b'
},
count : {
$sum : 1
}
}
},{
$group : {
_id : {
a : '$_id.a'
},
count_a : {$sum: '$count'},
doc_a : {
$push : {
b : '$_id.b',
count_b : '$count'
}
}
}
}

Related

MongoDB count number of non-missing fields

I'm using the following code to calculate average and standard deviation of a field named "b" in my collection.
db.ctg.aggregate(
[
{
$group:
{
_id: "b",
avg: { $avg: "$b" },
stdev: { $stdDevPop: "$b" }
}
}
]
)
The result is:
{ "_id" : "b", "avg" : 878.4397930385701, "stdev" : 893.8744489449962 }
I need to add number of non missing elements of "b" to my result so it looks like this:
{ "_id" : "b", "avg" : 878.4397930385701, "stdev" : 893.8744489449962, "nonmissing": 2126 }
How can I do this in the query above?
Result of $avg & $stdDevPop doesn't change even after removal of documents where b doesn't exists ($avg ignores all docs where field is non-numeric/missing), So you can try below query.
Query :
db.ctg.aggregate([
{ $match: { b: { $exists: true } } },
{
$group:
{
_id: "b",
avg: { $avg: "$b" },
stdev: { $stdDevPop: "$b" },
nonMissing: { $sum: 1 }
}
}
])

Get max from unwound arrays

I have a collection of documents where I want to find the maximum values of each of the ratios of every possible pair of fields in the data object. For example:
Documents:
[
{ data: { a: 1, b: 5, c: 2 } },
{ data: { a: 4, b: 1, c: 1 } },
{ data: { a: 2, b: 4, c: 3 } }
]
Desired output:
{
a: { a: 1, b: 4, c: 4 },
b: { a: 5, b: 1, c: 2.5 },
c: { a: 2, b: 1, c: 1 }
}
So the output a.b is the largest of the a:b ratios 1/5, 4/1, and 2/4.
So I figure I first use $objectToArray to convert data, then $unwind on the result, but I'm having a hard time figuring out how to group everything together. The number of documents I have won't be too large, but the number of keys in data can be in the low thousands, so I'm not sure how well Mongo will be able to handle doing a bunch of $lookup's and comparing the values like that.
You can try following aggregation:
db.col.aggregate([
{
$addFields: { data: { $objectToArray: "$data" } }
},
{
$project: {
pairs: {
$map: {
input: { $range: [ 0, { $multiply: [ { $size: "$data" }, { $size: "$data" } ] } ] },
as: "index",
in: {
$let: {
vars: {
leftIndex: { $floor: { $divide: [ "$$index", { $size: "$data" } ] } },
rightIndex: { $mod: [ "$$index", { $size: "$data" } ] }
},
in: {
l: { $arrayElemAt: [ "$data", "$$leftIndex" ] },
r: { $arrayElemAt: [ "$data", "$$rightIndex" ] }
}
}
}
}
}
}
},
{ $unwind: "$pairs" },
{
$group: {
_id: { l: "$pairs.l.k", r: "$pairs.r.k" },
value: { $max: { $divide: [ "$pairs.l.v", "$pairs.r.v" ] } }
}
},
{
$sort: {
"_id.l": 1, "_id.r": 1
}
},
{
$group: {
_id: "$_id.l",
values: { $push: { k: "$_id.r", v: "$value" } }
}
},
{
$addFields: { values: { $arrayToObject: "$values" } }
},
{
$project: {
root: [ { k: "$_id", v: "$values" } ]
}
},
{
$sort: { "root.k": 1 }
},
{
$replaceRoot: {
newRoot: {
$arrayToObject: "$root"
}
}
}
])
Basically you need $objectToArray and $arrayToObject to transform between arrays and objects. Basically the point is that for each object you need to generate nxn pairs (3x3=9 in this case). You can perform such iteration using $range operator. Then using $mod and $divide with $floor you can get index pairs like (0,0)...(2,2). Then you just need $group with $max to get max values for each pair type (like a with b and so on). To get final shape you also need $replaceRoot.
Outputs:
{ "a" : { "a" : 1, "b" : 4, "c" : 4 } }
{ "b" : { "a" : 5, "b" : 1, "c" : 2.5 } }
{ "c" : { "a" : 2, "b" : 1, "c" : 1 } }

how to sum up the document in mongo db

Now I want to query "SUM the number of Score between Q1 to Q5 (the result should be 4). Attached the collection snapshot of mlab.
mlab snapshot
db.temp.aggregate({ $match: {
$and: [
{ QuestionNo: { $gte: 1 } },
{ QuestionNo: { $lte: 5 } }
]
}},
{ $group: { _id : null, sum : { $sum: "$Score" } } });
I dont see any response in console. Any help is much appreciated.
First of all there is no need to over complicate your match query. What you want is basically all QuestionNo's between 1-5. I assume your documents look something like this:
{
"QuestionNo" : 1,
"Score" : 55
}
/* 2 */
{
"QuestionNo" : 2,
"Score" : 33
}
etc...
If you want to sum all the results then you can do
db.temp.aggregate(
{
$match: {
QuestionNo: { $gte: 1 , $lte: 5 }
}
},
{
$group: {
_id: null,
sum: { $sum: "$Score"}
}
}
)
If you want to group them by QuestionNo then you can do this:
db.temp.aggregate(
{
$match: {
QuestionNo: { $gte: 1 , $lte: 5 }
}
},
{
$group: {
_id: "$QuestionNo",
sum: { $sum: "$Score"}
}
}
)

How to use nested grouping in MongoDB

I need to find total count of duplicate profiles per organization level. I have documents as shown below:
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "75"
}
"_id" : "1"
},
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "75"
}
"_id" : "2"
},
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "77"
}
"_id" : "3"
},
{
"OrganizationId" : 10,
"Profile" : {
"_id" : "77"
}
"_id" : "4"
}
I have written query which is a group by ProfileId and OrganizationId. The results i am getting as shown below:
Organization Total
10 2
10 2
But i want to get the sum of total per organization level, that means Org 10 should have one row with sum of 4.
The query i am using as shown below:
db.getSiblingDB("dbName").OrgProfile.aggregate(
{ $project: { _id: 1, P: "$Profile._id", O: "$OrganizationId" } },
{ $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} },
{ $match: { c: { $gt: 1 } } });
Any ideas ? Please help
The following pipeline should give you the desired output, whereas the last $project stage is just for cosmetic purposes to turn _id into OrganizationId but is not needed for the essential computation so you may omit it.
db.getCollection('yourCollection').aggregate([
{
$group: {
_id: { org: "$OrganizationId", profile: "$Profile._id" },
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.org",
Total: {
$sum: {
$cond: {
if: { $gte: ["$count", 2] },
then: "$count",
else: 0
}
}
}
}
},
{
$project: {
_id: 0,
Organization: "$_id",
Total: 1
}
}
])
gives this output
{
"Total" : 4.0,
"Organization" : 10
}
To filter out organizations without duplicates you can use $match which will also result in a simplification of the second $group stage
...aggregate([
{
$group: {
_id: { org: "$OrganizationId", profile: "$Profile._id" },
count: { $sum: 1 }
}
},
{
$match: {
count: { $gte: 2 }
}
},
{
$group: {
_id: "$_id.org",
Total: { $sum: "$count" }
}
},
{
$project: {
_id: 0,
Organization: "$_id",
Total: 1
}
}
])
I think I have a solution for you. In that last step there, instead of matching, I think you want another $group.
.aggregate([
{ $project: { _id: 1, P: "$Profile._id", O: "$OrganizationId" } }
,{ $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} }
,{ $group: { _id: "$_id.o" , c: { $sum: "$c" } }}
]);
You can probably read it and figure out yourself what's happening in that last step, but just in case I'll explain. the last step is group all documents that have the same organization id, and then summing the quantity specified by the previous c field. After the first group, you had two documents that both had a count c of 2 but different profile id. The next group ignores the profile id and just groups them if they have the same organization id and adds their counts.
When I ran this query, here is my result, which is what I think you're looking for:
{
"_id" : 10,
"c" : 4
}
Hope this helps. Let me know if you have any questions.

Mongo DB - Second Level Search - elemMatch

I am trying to fetch all records (and count of all records) for a structure like the following,
{
id: 1,
level1: {
level2:
[
{
field1:value1;
},
{
field1:value1;
},
]
}
},
{
id: 2,
level1: {
level2:
[
{
field1:null;
},
{
field1:value1;
},
]
}
}
My requirement is to fetch the number of records that have field1 populated (atleast one in level2). I need to say fetch all the ids or the number of such ids.
The query I am using is,
db.table.find({},
{
_id = id,
value: {
$elemMatch: {'level1.level2.field1':{$exists: true}}
}
}
})
Please suggest.
EDIT1:
This is the question I was trying to ask in the comment. I was unable to elucidate in the comment properly. Hence, editing the question.
{
id: 1,
level1: {
level2:
[
{
field1:value1;
},
{
field1:value1;
},
]
}
},
{
id: 2,
level1: {
level2:
[
{
field1:value2;
},
{
field1:value2;
},
{
field1:value2;
}
]
}
}
{
id: 3,
level1: {
level2:
[
{
field1:value1;
},
{
field1:value1;
},
]
}
}
The query we used results in
value1: 4
value2: 3
I want something like
value1: 2 // Once each for documents 1 & 3
value2: 1 // Once for document 2
You can do that with the following find query:
db.table.find({ "level1.level2" : { $elemMatch: { field1 : {$exists: true} } } }, {})
This will return all documents that have a field1 in the "level1.level2" structure.
For your question in the comment, you can use the following aggregation to "I had to return a grouping (and the corresponding count) for the values in field1":
db.table.aggregate(
[
{
$unwind: "$level1.level2"
},
{
$match: { "level1.level2.field1" : { $exists: true } }
},
{
$group: {
_id : "$level1.level2.field1",
count : {$sum : 1}
}
}
]
UPDATE: For your question "'value1 - 2` At level2, for a document, assume all values will be the same for field1.".
I hope i understand your question correctly, instead of grouping only on the value of field1, i added the document _id as an xtra grouping:
db.table.aggregate(
[
{
$unwind: "$level1.level2"
},
{
$match: {
"level1.level2.field1" : { $exists: true }
}
},
{
$group: {
_id : { id : "$_id", field1: "$level1.level2.field1" },
count : {$sum : 1}
}
}
]
);
UPDATE2:
I altered the aggregation and added a extra grouping, the aggregation below gives you the results you want.
db.table.aggregate(
[
{
$unwind: "$level1.level2"
},
{
$match: {
"level1.level2.field1" : { $exists: true }
}
},
{
$group: {
_id : { id : "$_id", field1: "$level1.level2.field1" }
}
},
{
$group: {
_id : { id : "$_id.field1"},
count : { $sum : 1}
}
}
]
);