Mongodb - aggregate function nested ob - mongodb

We need to calculate the minimum bounding rectangle (MBR) on Geospatial data.
In oracle, we have SDO_AGGR_MBR function, is there any similar function in MongoDB.
"coord" : {
"type" : "Polygon",
"coordinates" : [
[
[
25.5377574375611,
42.8545750237221
],
[
47.7803203666229,
42.8545750237221
],
[
47.7803203661319,
52.0987759993153
],
[
25.5377574370701,
52.0987759993153
],
[
25.5377574375611,
42.8545750237221
]
]
]
}
We have the geometry data like above but the coordinates length may vary. So, we need to find the minx, miny, maxx and maxy from these data.

I don't think there is a built-in function for it. But you can simply do the following;
db.collection.aggregate([
{
$unwind: "$coordinates"
},
{
$unwind: "$coordinates"
},
{
$group: {
_id: null,
minX: {
$min: { $arrayElemAt: [ "$coordinates", 0 ] }
},
maxX: {
$max: { $arrayElemAt: [ "$coordinates", 0 ] }
},
minY: {
$min: { $arrayElemAt: [ "$coordinates", 1 ] }
},
maxY: {
$max: { $arrayElemAt: [ "$coordinates", 1 ] }
}
}
}
])
Where first unwraps the coordinates array with $unwind (twice for extra [ ] block), so that aggregate pipeline can iterate on it. Then we just use a $group with _id: null which is a special operation to evaluate min/max values for all elements of the array.
which will get you a response you request;
[
{
"_id": null,
"maxX": 47.7803203666229,
"maxY": 52.0987759993153,
"minX": 25.5377574370701,
"minY": 42.8545750237221
}
]
check on mongoplayground

The most efficient way is to employ the $map and $reduce operators along with $let. This allows you to process each document by manipulating the array's inline and then use $min and $max operators to obtain the bounding values:
db.collection.aggregate([
{ "$replaceRoot": {
"newRoot": {
"$let": {
"vars": {
"m": {
"$map": {
"input": {
"$reduce": {
"input": "$coord.coordinates",
"initialValue": [],
"in": {
"$concatArrays": [ "$$value", "$$this"]
}
}
},
"in": {
"x": { "$arrayElemAt": [ "$$this", 0 ] },
"y": { "$arrayElemAt": [ "$$this", 1 ] }
}
}
}
},
"in": {
"_id": "$_id",
"coord": "$coord",
"minX": { "$min": "$$m.x" },
"minY": { "$min": "$$m.y" },
"maxX": { "$max": "$$m.x" },
"maxY": { "$max": "$$m.y" }
}
}
}
}}
])
And the output, would be like:
{
"_id" : ObjectId("5d9330e95994eb7018f59218"),
"coord" : {
"type" : "Polygon",
"coordinates" : [
[
[
25.5377574375611,
42.8545750237221
],
[
47.7803203666229,
42.8545750237221
],
[
47.7803203661319,
52.0987759993153
],
[
25.5377574370701,
52.0987759993153
],
[
25.5377574375611,
42.8545750237221
]
]
]
},
"minX" : 25.5377574370701,
"minY" : 42.8545750237221,
"maxX" : 47.7803203666229,
"maxY" : 52.0987759993153
}
Note the usage of the $replaceRoot aggregation pipeline stage, as this will allow the nested expressions with $let to essentially provide global variables to the document to produce that can be utilized in any output property.
The $reduce here basically serves the function to flatten the array from the standard GeoJSON form into just an array of coordinate pairs, without the additional bounding array.
This then feeds input to the $map which employs $arrayElemAt in order to re-map each coordinate pair into an object with x and y keys. This makes things much more simple for the actual execution or output section of the $let.
Note: An alternate approach to using $arrayElemAt against each key within the $map might well be to use $zip and $arrayToObject:
"in": {
"$arrayToObject": { "$zip": { "inputs": [ ["x","y"], "$$this" ] } }
}
It has the same principle in the overall output, but takes advantage of $zip producing "paired" arrays, which also happens to be valid input for $arrayToObject to produce the final object form.
In the final part, we now basically have an array of objects with the named keys x and y. MongoDB allows a convenient way to remap simply the values for those named keys with notation like "$$m.x" where the "$$m expression related to the named variable of the $let and is our array of objects, and the .x part of course simply means only the values of x. This is a basic shorthand for a $map statement in itself, which suits this special usage case.
These arrays of values for specific properties can now be applied to the $min and $max operators, and this is how you get you min and max coordinates for a bounding rectangle.
Note that inline operators for arrays should always be preferred to $unwind.
The $unwind aggregation pipeline stage was an old introductory way of dealing with array elements by essentially flattening them into separate documents.
Though necessary when you actually want to group on a value as a key that comes from within an array, most operations which don't actually need that ( like this one ) can be done with more modern approaches.
The usage of $unwind is actually a huge performance penalty, due to it's function essentially being to replicate the entire parent document content of the field containing the array into it's own new document. Particularly in large datasets this has a very negative effect on performance, due to much increases I/O and memory usage.
The main lesson being, unless it's necessary to the operation being performed ( and here it is not ) then you should not be using $unwind in an aggregation pipeline. It might look easier to understand, but your code is actually hurting the system it's running on by including it.
Alternate Client Approach
Note also that if you don't actually need these results for any further aggregation processing, then it's probably a lot cleaner to code in the client as each document is processed.
For example, here's a plain JavaScript version for the shell:
db.collection.find().map(({ _id, coord }) =>
(({ coordinates }) =>
({
_id,
coord,
...(m =>
({
minX: Math.min(...m.map(({ x }) => x)),
minY: Math.min(...m.map(({ y }) => y)),
maxX: Math.max(...m.map(({ x }) => x)),
maxY: Math.max(...m.map(({ y }) => y))
})
)(
((c) => c.reduce((o,e) => [ ...o, ...e ],[]).map(([x,y]) => ({ x, y })) )(coordinates)
)
})
)(coord)
)
That has exactly the same output and is not nearly as unwieldy as the BSON operator statements required for an aggregation pipeline.

Related

Find Index of first Matching Element $gte with $indexOfArray

MongoDB has $indexOfArray to let you find the element's array index, for example:
$indexOfArray: ["$article.date", ISODate("2019-03-29")]
Is it possible to use comparison operators with $indexOfArray together, like:
$indexOfArray: ["$article.date", {$gte: ISODate("2019-03-29")}]
Not it's not possible with $indexOfArray as that will only look for an equality match to an expression as the second argument.
Instead you can make a construct like this:
db.data.insertOne({
"_id" : ObjectId("5ca01e301a97dd8b468b3f55"),
"array" : [
ISODate("2018-03-01T00:00:00Z"),
ISODate("2018-03-02T00:00:00Z"),
ISODate("2018-03-03T00:00:00Z")
]
})
db.data.aggregate([
{ "$addFields": {
"matchedIndex": {
"$let": {
"vars": {
"matched": {
"$arrayElemAt": [
{ "$filter": {
"input": {
"$zip": {
"inputs": [ "$array", { "$range": [ 0, { "$size": "$array" } ] }]
}
},
"cond": { "$gte": [ { "$arrayElemAt": ["$$this", 0] }, new Date("2018-03-02") ] }
}},
0
]
}
},
"in": {
"$arrayElemAt": [{ "$ifNull": [ "$$matched", [0,-1] ] },1]
}
}
}
}}
])
Which would return for the $gte of Date("2018-03-02"):
{
"_id" : ObjectId("5ca01e301a97dd8b468b3f55"),
"array" : [
ISODate("2018-03-01T00:00:00Z"),
ISODate("2018-03-02T00:00:00Z"),
ISODate("2018-03-03T00:00:00Z")
],
"matchedIndex" : 1
}
Or -1 where the condition was not met in order to be consistent with $indexOfArray.
The basic premise is using $zip in order to "pair" with the array index positions which get generated from $range and $size of the array. This can be fed to a $filter condition which will return ALL matching elements to the supplied condition. Here it is the first element of the "pair" ( being the original array content ) via $arrayElemAt matching the specified condition using $gte
{ "$gte": [ { "$arrayElemAt": ["$$this", 0] }, new Date("2018-03-02") ] }
The $filter will return either ALL elements after ( in the case of $gte ) or an empty array where nothing was found. Consistent with $indexOfArray you only want the first match, which is done with another wrapping $arrayElemAt on the output for the 0 position.
Since the result could be an omitted value ( which is what happens by $arrayElemAt: [[], 0] ) then you use [$ifNull][8] to test the result ans pass a two element array back with a -1 as the second element in the case where the output was not defined. In either case that "paired" array has the second element ( index 1 ) extracted again via $arrayElemAt in order to get the first matched index of the condition.
Of course since you want to refer to that whole expression, it just reads a little cleaner in the end within a $let, but that is optional as you can "inline" with the $ifNull if wanted.
So it is possible, it's just a little more involved than placing a range expression inside of $indexOfArray.
Note that any expression which actually returns a single value for equality match is just fine. But since operators like $gte return a boolean, then that would not be equal to any value in the array, and thus the sort of processing with $filter and then extraction is what you require.

Return only matched sub-document elements within a nested array

The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $ operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores" array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers", and as such every "offer" within the matchd "stores" array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter operation:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.x and above with $map and $setDifference:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
MongoDB 3.2.x and greater
So generally speaking, $filter is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers" within "stores" to examime and $filter that content.
The simple comparison here is "Does the "size" array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset operation to compare an array ("set") of ["L"] to the target array. Where that condition is true ( it contains "L" ) then the array element for "offers" is retained and returned in the result.
In the higher level $filter, you are then looking to see if the result from that previous $filter returned an empty array [] for "offers". If it is not empty, then the element is returned or otherwise it is removed.
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter in this version you can use $map to inspect each element and then use $setDifference to filter out any elements that were returned as false.
So $map is going to return the whole array, but the $cond operation just decides whether to return the element or instead a false value. In the comparison of $setDifference to a single element "set" of [false] all false elements in the returned array would be removed.
In all other ways, the logic is the same as above.
MongoDB 2.2.x and up
So below MongoDB 2.6 the only tool for working with arrays is $unwind, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
Conclusion
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find().
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match and $project, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints.
Have a fun!
It's also works without aggregate.
here is the solution link:https://mongoplayground.net/p/Q5lxPvGK03A
db.collection.find({
"stores.offers.size": "L"
},
{
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [
[
"L"
],
"$$offer.size"
]
}
}
}
}
}
},
"as": "store",
"cond": {
"$ne": [
"$$store.offers",
[]
]
}
}
}
})

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.

Turn _ids into Keys of New Object

I have a huge bunch of documents as such:
{
_id: '1abc',
colors: [
{ value: 'red', count: 2 },
{ value: 'blue', count: 3}
]
},
{
_id: '2abc',
colors: [
{ value: 'red', count: 7 },
{ value: 'blue', count: 34},
{ value: 'yellow', count: 12}
]
}
Is it possible to make use of aggregate() to get the following?
{
_id: 'null',
colors: {
"1abc": [
{ value: 'red', count: 2 },
{ value: 'blue', count: 3}
],
"2abc": [
{ value: 'red', count: 7 },
{ value: 'blue', count: 34},
{ value: 'yellow', count: 12}
]
}
}
Basically, is it possible to turn all of the original documents' _ids into keys of a new object in the singular new aggregated document?
So far, when trying to use$group, I had not been able to use a variable value, e.g. $_id, on the left hand side of an assignment. Am I missing something or is it simply impossible?
I can do this easily using Javascript but it is unbearably slow. Hence why I am looking to see if it is possible using mongo native aggregate(), which will probably be faster.
If impossible... I would appreciate any kind suggestions that could point towards a sufficient alternative (change structure, etc.?). Thank you!
Like a said in comments, whilst there are things you can do with the aggregation framework or even mapReduce to make the "server" reshape the response, it's kind of silly to do so.
Lets consider the cases:
Aggregate
db.collection.aggregate([
{ "$match": { "_id": { "$in": ["1abc","2abc"] } } },
{ "$group": {
"_id": null,
"result": { "$push": "$$ROOT" }
}},
{ "$project": {
"colors": {
"1abc": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": "$result",
"as": "r",
"cond": { "$eq": [ "$$r._id", "1abc" ] },
}
},
"as": "r",
"in": "$$r.colors"
}},
0
]
},
"2abc": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": "$result",
"as": "r",
"cond": { "$eq": [ "$$r._id", "2abc" ] },
}
},
"as": "r",
"in": "$$r.colors"
}},
0
]
}
}
}}
])
So the aggregation framework purely does not dynamically generate "keys" of a document. If you want to process this way, then you need to know all of the "values" that you are going to use to make the keys in the result.
After putting everything into one document with $group, you can then work with the result array to extact data for your "keys". The basic operators here are:
$filter to get the matched element of the array for the "value" that you want.
$map to return just the specific property from the filtered array
$arrayElemAt to just grab the single elment that was filtered out of the resulting mapped array
So it really isn't practical in a lot of cases, and the coding of the statement is fairly involved.
MapReduce
db.collection.mapReduce(
function() {
var obj = { "colors": {} };
obj.colors[this._id] = this.colors;
emit(null,obj);
},
function(key,values) {
var obj = { "colors": {} };
values.forEach(function(value) {
Object.keys(value.colors).forEach(function(key) {
obj.colors[key] = value.colors[key];
});
})
return obj;
},
{ "out": { "inline": 1 } }
)
Since it is actually written in a "language" then you have the ability to loop structures and "build things" in a more dynamic way.
However, close inspection should tell you that the "reducer" function here is not doing anything more than being the processor of "all the results" which have been "stuffed into it" but each emitted document.
That means that "iterating the values" fed to the reducer is really no different to "iterating the cursor", and that leads to the next conclusion.
Cursor Iteration
var result = { "colors": {} };
db.collection.find().forEach(function(doc) {
result.colors[doc._id] = doc.colors;
})
printjson(result)
The simplicity of this should really speak volumes. It is afterall doing exactly what you are trying to "shoehorn" into a server operation and nothing more, and just simply "rolls up it sleeves" and gets on with the task at hand.
The key point here is none of the process requires any "aggregation" in a real sense, that cannot be equally achieved by simply iterating the cursor and building up the response document.
This is really why you always need to look at what you are doing and choose the right method. "Server side" aggregation has a primary task of "reducing" a result so you would not need to iterate a cursor. But nothing here "reduces" anything. It's just all of the data, transformed into a different format.
Therefore the simple approach for this type of "transform" is to just iterate the cursor and build up your transformed version of "all the results" anyway.

MongoDB sum arrays from multiple documents on a per-element basis

I have the following document structure (simplified for this example)
{
_id : ObjectId("sdfsdf"),
result : [1, 3, 5, 7, 9]
},
{
_id : ObjectId("asdref"),
result : [2, 4, 6, 8, 10]
}
I want to get the sum of those result arrays, but not a total sum, instead a new array corresponding to the sum of the original arrays on an element basis, i.e.
result : [3, 7, 11, 15, 19]
I have searched through the myriad questions here and a few come close (e.g. this one, this one, and this one), but I can't quite get there.
I can get the sum of each array fine
aggregate(
[
{
"$unwind" : "$result"
},
{
"$group": {
"_id": "$_id",
"results" : { "$sum" : "$result"}
}
}
]
)
which gives me
[ { _id: sdfsdf, results: 25 },
{ _id: asdref, results: 30 } ]
but I can't figure out how to get the sum of each element
You can use includeArrayIndex if you have 3.2 or newer MongoDb.
Then you should change $unwind.
Your code should be like this:
.aggregate(
[
{
"$unwind" : { path: "$result", includeArrayIndex: "arrayIndex" }
},
{
"$group": {
"_id": "$arrayIndex",
"results" : { "$sum" : "$result"}
}
},
{
$sort: { "_id": 1}
},
{
"$group":{
"_id": null,
"results":{"$push":"$results"}
}
},
{
"$project": {"_id":0,"results":1}
}
]
)
There is an alternate approach to this, but mileage may vary on how practical it is considering that a different approach would involve using $push to create an "array of arrays" and then applying $reduce as introduced in MongoDB 3.4 to $sum those array elements into a single array result:
db.collection.aggregate([
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}},
{ "$addFields": {
"result": {
"$reduce": {
"input": "$result",
"initialValue": [],
"in": {
"$map": {
"input": {
"$zip": {
"inputs": [ "$$this", "$$value" ],
"useLongestLength": true
}
},
"as": "el",
"in": { "$sum": "$$el" }
}
}
}
}
}}
])
The real trick there is in the "input" to $map we use the $zip operation which creates a transposed list of arrays "pairwise" for the two array inputs.
In a first iteration this takes the empty array as supplied to $reduce and would return the "zipped" output with consideration to the first object found as in:
[ [0,1], [0,3], [0,5], [0,7], [0,9] ]
So the useLongestLength would substitute the empty array with 0 values out to the the length of the current array and "zip" them together as above.
Processing with $map, each element is subject to $sum which "reduces" the returned results as:
[ 1, 3, 5, 7, 9 ]
On the second iteration, the next entry in the "array of arrays" would be picked up and processed by $zip along with the previous "reduced" content as:
[ [1,2], [3,4], [5,6], [7,8], [9,10] ]
Which is then subject to the $map for each element using $sum again to produce:
[ 3, 7, 11, 15, 19 ]
And since there were only two arrays pushed into the "array of arrays" that is the end of the operation, and the final result. But otherwise the $reduce would keep iterating until all array elements of the input were processed.
So in some cases this would be the more performant option and what you should be using. But it is noted that particularly when using a null for $group you are asking "every" document to $push content into an array for the result.
This could be a cause of breaking the BSON Limit in extreme cases, and therefore when aggregating positional array content over large results, it is probably best to use $unwind with the includeArrayIndex option instead.
Or indeed actually take a good look at the process, where in particular if the "positional array" in question is actually the result of some other "aggregation operation", then you should rather be looking at the previous pipeline stages that were used to create the "positional array". And then consider that if you wanted those positions "aggregated further" to new totals, then you should in fact do that "before" the positional result was obtained.