Set fields based on sub document aggregates - mongodb

In the collection below I am trying to calculate total / sold_total / sold_percent by aggregating the copies sub doc.
Similarly, I want to calculate grand_total / sold_grand_total / sold_grand_percent by aggregating the inventory sub document.
I prefer to do this during writes/updates or using a MongoDB function/job instead during 'reads' for efficiency.
I have tried a couple of aggregate pipelines but sub-array unwinding the copies array clears everything above it. Any help appreciated, thanks.
{
"_id" : "xyz",
"store" : "StoreB",
"grand_total" : 7,
"sold_grand_total" : 5,
"sold_grand_percent" : 72,
"inventory" : [
{"title" : "BookA", "total" : 4, "sold_total" : 3, "sold_percent" : 75,
"copies" : [
{"_id": 1, "condition": "new", "sold": 1 },
{"_id": 2,"condition": "new", "sold": 1 },
{"_id": 3,"condition": "new", "sold": 0 },
{"_id": 4,"condition": "new", "sold": 1 }
]
},
{"title" : "BookB", "total" : 1, "sold_total" : 1, "sold_percent" : 100,
"copies" : [
{"_id": 1, "condition": "new", "sold": 1 }
]
},
{"title" : "BookC", "total" : 2, "sold_total" : 1, "sold_percent" : 50,
"copies" : [
{"_id": 1, "condition": "new", "sold": 1 },
{"_id": 2,"condition": "new", "sold": 0 }
]
}
]
}

There are multiple ways of going this. I am not sure what your architecture is.
These are the 2 different aggregates:
This one gives "total" and "sold_total"
[
{
"$unwind" : "$inventory"
},
{
"$unwind" : "$inventory.copies"
},
{
"$group": {
"_id": "$inventory.title",
"total": {
"$sum": "$inventory.copies.sold"
},
"sold_total": {
"$sum": 1
}
}
}]
Other gives grand_total / sold_grand_total
[
{
"$unwind" : "$inventory"
},
{
"$unwind" : "$inventory.copies"
},
{
"$group": {
"_id": null,
"total": {
"$sum": "$inventory.copies.sold"
},
"count": {
"$sum": 1
}
}
}]
You can do both together, by getting the entire object from a group by operation and giving performing the other group by on it. basically, project and pipeline it.

Related

Aggregate array of subdocuments into single document

My document looks like the following (ignore timepoints for this question):
{
"_id": "xyz-800",
"site": "xyz",
"user": 800,
"timepoints": [
{"timepoint": 0, "a": 1500, "b": 700},
{"timepoint": 2, "a": 1000, "b": 200},
{"timepoint": 4, "a": 3500, "b": 1500}
],
"groupings": [
{"type": "MNO", "group": "<10%", "raw": "1"},
{"type": "IJK", "group": "Moderate", "raw": "23"}
]
}
Can I flatten (maybe not the right term) so the groupings are in a single document. I would like the result to look like:
{
"id": "xyz-800",
"site": "xyz",
"user": 800,
"mnoGroup": "<10%",
"mnoRaw": "1",
"ijkGroup": "Moderate",
"ijkRaw": "23"
}
In reality I would like the mnoGroup and mnoRaw attributes to be created no matter if the attribute groupings.type = "MNO" exists or not. Same with the ijk attributes.
You can use $arrayElemAt to read the groupings array by index in the first project stage and $ifNull to project optional values in the final project stage. Litte verbose, but'll see what I can do.
db.groupmore.aggregate({
"$project": {
_id: 1,
site: 1,
user: 1,
mnoGroup: {
$arrayElemAt: ["$groupings", 0]
},
ijkGroup: {
$arrayElemAt: ["$groupings", -1]
}
}
}, {
"$project": {
_id: 1,
site: 1,
user: 1,
mnoGroup: {
$ifNull: ["$mnoGroup.group", "Unspecified"]
},
mnoRaw: {
$ifNull: ["$mnoGroup.raw", "Unspecified"]
},
ijkGroup: {
$ifNull: ["$ijkGroup.group", "Unspecified"]
},
ijkRaw: {
$ifNull: ["$ijkGroup.raw", "Unspecified"]
}
}
})
Sample Output
{ "_id" : "xyz-800", "site" : "xyz", "user" : 800, "mnoGroup" : "<10%", "mnoRaw" : "1", "ijkGroup" : "Moderate", "ijkRaw" : "23" }
{ "_id" : "ert-600", "site" : "ert", "user" : 8600, "mnoGroup" : "Unspecified", "mnoRaw" : "Unspecified", "ijkGroup" : "Unspecified", "ijkRaw" : "Unspecified" }

MongoDB sort vs aggregate $sort on array index

With a MongoDB collection test containing the following documents:
{ "_id" : 1, "color" : "blue", "items" : [ 1, 2, 0 ] }
{ "_id" : 2, "color" : "red", "items" : [ 0, 3, 4 ] }
if I sort them in reversed order based on the second element in the items array, using
db.test.find().sort({"items.1": -1})
they will be correctly sorted as:
{ "_id" : 2, "color" : "red", "items" : [ 0, 3, 4 ] }
{ "_id" : 1, "color" : "blue", "items" : [ 1, 2, 0 ] }
However, when I attempt to sort them using the aggregate function:
db.test.aggregate([{$sort: {"items.1": -1} }])
They will not sort correctly, even though the query is accepted as valid:
{
"result" : [
{
"_id" : 1,
"color" : "blue",
"items" : [
1,
2,
0
]
},
{
"_id" : 2,
"color" : "red",
"items" : [
0,
3,
4
]
}
],
"ok" : 1
}
Why is this?
The aggregation framework just does not "deal with" arrays in the same way as is applied to .find() queries in general. This is not only true of operations like .sort(), but also with other operators, and namely $slice, though that example is about to get a fix ( more later ).
So it pretty much is impossible to deal with anything using the "dot notation" form with an index of an array position as you have. But there is a way around this.
What you "can" do is basically work out what the "nth" array element actually is as a value, and then return that as a field that can be sorted:
db.test.aggregate([
{ "$unwind": "$items" },
{ "$group": {
"_id": "$_id",
"items": { "$push": "$items" },
"itemsCopy": { "$push": "$items" },
"first": { "$first": "$items" }
}},
{ "$unwind": "$itemsCopy" },
{ "$project": {
"items": 1,
"itemsCopy": 1,
"first": 1,
"seen": { "$eq": [ "$itemsCopy", "$first" ] }
}},
{ "$match": { "seen": false } },
{ "$group": {
"_id": "$_id",
"items": { "$first": "$items" },
"itemsCopy": { "$push": "$itemsCopy" },
"first": { "$first": "$first" },
"second": { "$first": "$itemsCopy" }
}},
{ "$sort": { "second": -1 } }
])
It's a horrible and "iterable" approach where you essentially "step through" each array element by getting the $first match per document from the array after processing with $unwind. Then after $unwind again, you test to see if that array elements are the same as the one(s) already "seen" from the identified array positions.
It's terrible, and worse for the more positions you want to move along, but it does get the result:
{ "_id" : 2, "items" : [ 0, 3, 4 ], "itemsCopy" : [ 3, 4 ], "first" : 0, "second" : 3 }
{ "_id" : 1, "items" : [ 1, 2, 0 ], "itemsCopy" : [ 2, 0 ], "first" : 1, "second" : 2 }
{ "_id" : 3, "items" : [ 2, 1, 5 ], "itemsCopy" : [ 1, 5 ], "first" : 2, "second" : 1 }
Fortunately, upcoming releases of MongoDB ( as currently available in develpment releases ) get a "fix" for this. It may not be the "perfect" fix that you desire, but it does solve the basic problem.
There is a new $slice operator available for the aggregation framework there, and it will return the required element(s) of the array from the indexed positions:
db.test.aggregate([
{ "$project": {
"items": 1,
"slice": { "$slice": [ "$items",1,1 ] }
}},
{ "$sort": { "slice": -1 } }
])
Which produces:
{ "_id" : 2, "items" : [ 0, 3, 4 ], "slice" : [ 3 ] }
{ "_id" : 1, "items" : [ 1, 2, 0 ], "slice" : [ 2 ] }
{ "_id" : 3, "items" : [ 2, 1, 5 ], "slice" : [ 1 ] }
So you can note that as a "slice", the result is still an "array", however the $sort in the aggregation framework has always used the "first position" of the array in order to sort the contents. That means that with a singular value extracted from the indexed position ( just as the long procedure above ) then the result will be sorted as you expect.
The end cases here are that is just how it works. Either live with the sort of operations you need from above to work with a indexed position of the array, or "wait" until a brand new shiny version comes to your rescue with better operators.

MongoDB: aggregating fields from arrays of subdocuments

I have a mongodb collection called Events, containing baseball games. Here is an example of one record in the table:
{
"name" : "Game# 814",
"dateStart" : ISODate("2012-09-28T14:47:53.695Z"),
"_id" : ObjectId("53a1b24de3f25f4443d9747e"),
"stats" : [
{
"team" : ObjectId("53a11a43a8de6dd8375c940b"),
"teamName" : "Reds",
"_id" : ObjectId("53a1b24de3f25f4443d97480"),
"score" : 17
},
{
"team" : ObjectId("53a11a43a8de6dd8375c938d"),
"teamName" : "Yankees",
"_id" : ObjectId("53a1b24de3f25f4443d9747f"),
"score" : 12
}
]
"__v" : 0
}
I need help writing the query that returns standings for all teams. The result set should look like:
{
"team" : ObjectId("53a11a43a8de6dd8375c938d"),
"teamName" : "Yankees",
"wins" : <<number of Yankees wins>>
"losses" : <<number of Yankees losses>>
"draws" : <<number of Yankees draws>>
}
{
"team" : ObjectId("53a11a43a8de6dd8375c940b"),
"teamName" : "Reds",
"wins" : <<number of Reds wins>>
"losses" : <<number of Reds losses>>
"draws" : <<number of Reds draws>>
}
...
Here's the query I've started with...
db.events.aggregate(
{"$unwind": "$stats" },
{ $group : {
_id : "$stats.team",
gamesPlayed : { $sum : 1},
totalScore : { $sum : "$stats.score" }
}}
);
... which returns results:
{
"result" : [
{
"_id" : ObjectId("53a11a43a8de6dd8375c93cb"),
"gamesPlayed" : 125, // not a requirement... just trying to get $sum working
"totalScore" : 1213 // ...same here
},
{
"_id" : ObjectId("53a11a44a8de6dd8375c955f"),
"gamesPlayed" : 128,
"totalScore" : 1276
},
{
"_id" : ObjectId("53a11a44a8de6dd8375c9661"),
"gamesPlayed" : 152,
"totalScore" : 1509
},
....
It would seem advisable for you to keep your "wins", "losses", "draws" within your documents as you create or update them. But it is possible to do with aggregate if a little long winded
db.events.aggregate([
// Unwind the "stats" array
{ "$unwind": "$stats" },
// Combine the document with new fields
{ "$group": {
"_id": "$_id",
"firstTeam": { "$first": "$stats.team" },
"firstTeamName": { "$first": "$stats.teamName" },
"firstScore": { "$first": "$stats.score" },
"lastTeam": { "$last": "$stats.team" },
"lastTeamName": { "$last": "$stats.teamName" },
"lastScore": { "$last": "$stats.score" },
"minScore": { "$min": "$stats.score" },
"maxScore": { "$max": "$stats.score" }
}},
// Calculate by comparing scores
{ "$project": {
"firstTeam": 1,
"firstTeamName": 1,
"firstScore": 1,
"lastTeam": 1,
"lastTeamName": 1,
"lastScore": 1,
"firstWins": {
"$cond": [
{ "$gt": [ "$firstScore", "$lastScore" ] },
1,
0
]
},
"firstLosses": {
"$cond": [
{ "$lt": [ "$firstScore", "$lastScore" ] },
1,
0
]
},
"firstDraws": {
"$cond": [
{ "$eq": [ "$firstScore", "$lastScore" ] },
1,
0
]
},
"lastWins": {
"$cond": [
{ "$gt": [ "$lastScore", "$firstScore" ] },
1,
0
]
},
"lastLosses": {
"$cond": [
{ "$lt": [ "$lastScore", "$firstScore" ] },
1,
0
]
},
"lastDraws": {
"$cond": [
{ "$eq": [ "$lastScore", "$firstScore" ] },
1,
0
]
},
"type": { "$literal": [ true, false ] }
}},
// Unwind the "type"
{ "$unwind": "$type" },
// Group teams conditionally on "type"
{ "$group": {
"_id": {
"team": {
"$cond": [
"$type",
"$firstTeam",
"$lastTeam"
]
},
"teamName": {
"$cond": [
"$type",
"$firstTeamName",
"$lastTeamName"
]
}
},
"owins": {
"$sum": {
"$cond": [
"$type",
"$firstWins",
"$lastWins"
]
}
},
"olosses": {
"$sum": {
"$cond": [
"$type",
"$firstLosses",
"$lastLosses"
]
}
},
"odraws": {
"$sum": {
"$cond": [
"$type",
"$firstDraws",
"$lastDraws"
]
}
}
}},
// Project your final form
{ "$project": {
"_id": 0,
"team": "$_id.team",
"teamName": "$_id.teamName",
"wins": "$owins",
"losses": "$olosses",
"draws": "$odraws"
}}
])
The first part is to "re-shape" the document by unwinding the array and then grouping with "first" and "last" for defining fields for your two teams.
Then you want to $project through those documents and calculate your "wins", "losses" and "draws" for each team in the pairing. The additional thing is adding an array field for the two values true/false is convenient here. If you are on a pre 2.6 version of mongodb the $literal can be replaced with $const which is not documented but does the same thing.
Once you $unwind that "type" array, the documents can be split apart in the $group stage by evaluating whether to choose the "first" or "last" team field values via the use of $cond. This is a ternary operator that evaluates a true/false condition and returns the appropriate value according to that condition.
With a final $project your documents are formed exactly how you want.

Mongo aggregate nested array

I have a mongo collection with following structure
{
"userId" : ObjectId("XXX"),
"itemId" : ObjectId("YYY"),
"resourceId" : 1,
"_id" : ObjectId("528455229486ca3606004ec9"),
"parameter" : [
{
"name" : "name1",
"value" : 150,
"_id" : ObjectId("528455359486ca3606004eed")
},
{
"name" : "name2",
"value" : 0,
"_id" : ObjectId("528455359486ca3606004eec")
},
{
"name" : "name3",
"value" : 2,
"_id" : ObjectId("528455359486ca3606004eeb")
}
]
}
There can be multiple documents with the same 'useId' with different 'itemId' but the parameter will have same key/value pairs in all of them.
What I am trying to accomplish is return aggregated parameters "name1", "name2" and "name3" for each unique "userId" disregard the 'itemId'. so final results would look like for each user :
{
"userId" : ObjectId("use1ID"),
"name1" : (aggregatedValue),
"name2" : (aggregatedValue),
"name3" : (aggregatedVAlue)
},
{
"userId" : ObjectId("use2ID"),
"name1" : (aggregatedValue),
"name2" : (aggregatedValue),
"name3" : (aggregatedVAlue)
}
Is it possible to accomplish this using the aggregated methods of mongoDB ? Could you please help me to build the proper query to accomplish that ?
The simplest form of this is to keep things keyed by the "parameter" "name":
db.collection.aggregate(
// Unwind the array
{ "$unwind": "$parameter"},
// Group on the "_id" and "name" and $sum "value"
{ "$group": {
"_id": {
"userId": "$userId",
"name": "$parameter.name"
},
"value": { "$sum": "$parameter.value" }
}},
// Put things into an array for "nice" processing
{ "$group": {
"_id": "$_id.userId",
"values": { "$push": {
"name": "$_id.name",
"value": "$value"
}}
}}
)
If you really need to have the "values" of names as the field values, you can do the the following. But since you are "projecting" the fields/properties then you must specify them all in your code. You cannot be "dynamic" anymore and you are coding/generating each one:
db.collection.aggregate([
// Unwind the array
{ "$unwind": "$parameter"},
// Group on the "_id" and "name" and $sum "value"
{ "$group": {
"_id": {
"userId": "$userId",
"name": "$parameter.name"
},
"value": { "$sum": "$parameter.value"}
}},
// Project out discrete "field" names with $cond
{ "$project": {
"name1": { "$cond": [
{ "$eq": [ "$_id.name", "name1" ] },
"$value",
0
]},
"name2": { "$cond": [
{ "$eq": [ "$_id.name", "name2" ] },
"$value",
0
]},
"name3": { "$cond": [
{ "$eq": [ "$_id.name", "name3" ] },
"$value",
0
]},
}},
// The $cond put "0" values in there. So clean up with $group and $sum
{ "$group": {
_id: "$_id.userId",
"name1": { "$sum": "$name1" },
"name2": { "$sum": "$name2" },
"name3": { "$sum": "$name3" }
}}
])
So while the extra steps give you the result that you want ( well with a final project to change the _id to userId ), for my mind the short version is workable enough, unless you really do need it. Consider the output from there as well:
{
"_id" : ObjectId("53245016ea402b31d77b0372"),
"values" : [
{
"name" : "name3",
"value" : 2
},
{
"name" : "name2",
"value" : 0
},
{
"name" : "name1",
"value" : 150
}
]
}
So that would be what I would use, personally. But your choice.
Not sure if I got your question but if the name field can contain only "name1", "name2", "name3" or at least you are only interested in this values, one of the possible queries could be this one:
db.aggTest.aggregate(
{$unwind:"$parameter"},
{$project: {"userId":1, "parameter.name":1,
"name1" : {"$cond": [{$eq : ["$parameter.name", "name1"]}, "$parameter.value", 0]},
"name2" : {"$cond": [{$eq : ["$parameter.name", "name2"]}, "$parameter.value", 0]},
"name3" : {"$cond": [{$eq : ["$parameter.name", "name3"]}, "$parameter.value", 0]}}},
{$group : {_id : {userId:"$userId"},
name1 : {$sum:"$name1"},
name2 : {$sum:"$name2"},
name3 : {$sum:"$name3"}}})
It firsts unwinds the parameter array, then separates name1, name2 and name3 values into different columns. There's a simple conditional statement for that. After that we can easily aggreagate by the new columns.
Hope it helps!

Sort working opposite

I have a mongo collection:
/* 0 */
{
"_id" : ObjectId("51f1fcc08188d3117c6da351"),
"cust_id" : "abc123",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 25,
"items" : [{
"sku" : "ggg",
"qty" : 7,
"price" : 2.5
}, {
"sku" : "ppp",
"qty" : 5,
"price" : 2.5
}]
}
/* 1 */
{
"_id" : ObjectId("51fa1c318188d305fcbf9f9b"),
"cust_id" : "abc123",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 27,
"items" : [{
"sku" : "ggg",
"qty" : 7,
"price" : 2.5
}, {
"sku" : "ppp",
"qty" : 5,
"price" : 2.5
}]
}
When I am giving the aggregate query for sorting in ascending order:
db.orders.aggregate([{
"$unwind": "$items"
}, {
"$sort": {
"price": -1
}
}, {
"$match": {}
}, {
"$group": {
"price": {
"$first": "$price"
},
"items": {
"$push": {
"sku": "$items.sku"
}
},
"_id": "$_id"
}
}, {
"$project": {
"_id": 0,
"price": 1,
"items": 1
}
}])
I get result:
{
"result": [{
"price": 25,
"items": [{
"sku": "ggg"
}, {
"sku": "ppp"
}]
}, {
"price": 27,
"items": [{
"sku": "ggg"
}, {
"sku": "ppp"
}]
}]
}
i.e it is sorting in ascending order and vice versa.
Move the $sort after $group, since the previous sort will be lost after grouping.
db.orders.aggregate([{
"$unwind": "$items"
}, {
"$match": {}
}, {
"$group": {
"price": {
"$first": "$price"
},
"items": {
"$push": {
"sku": "$items.sku"
}
},
"_id": "$_id"
}
}, {
"$sort": {
"price": -1
}
}, {
"$project": {
"_id": 0,
"price": 1,
"items": 1
}
}])
For $natural operator, this is the quoted from the doc.
The $natural operator uses the following syntax to return documents in
the order they exist on disk
Long story short, that means the order you see is not necessarily consistent with the order it store in DB.