MongoDB advanced aggregation - mongodb

I'm a total newbie to MongoDB. I work on a privat project for my golf club to analyze the round.
I use meteorJS for the Application and tried some aggregation on the command line. But I'm not sure if I even have the right point to the task
A sample document:
{
"_id" : "2KasYR3ytsaX8YuoT",
"course" : {
"id" : "rHmYJBhRtSt38m68s",
"name" : "CourseXYZ"
},
"player" : {
"id" : "tdaYaSvXJueDq4oTN",
"firstname" : "Just",
"lastname" : "aPlayer"
},
"event" : "Training Day",
"tees" : [
{
"tee" : 1,
"par" : 4,
"fairway" : "straight",
"greenInRegulation" : true,
"putts" : 3,
"strokes" : 5
},
{
"tee" : 2,
"par" : 5,
"fairway" : "right",
"greenInRegulation" : true,
"putts" : 2,
"strokes" : 5
},
{
"tee" : 3,
"par" : 5,
"fairway" : "right",
"greenInRegulation" : false,
"shotType": "bunker",
"putts" : 2,
"strokes" : 5
}
]
}
My attempt so far:
db.analysis.aggregate([
{$unwind: "$tees"},
{$group: {
_id:"$player.id",
strokes: {$sum: "$tees.strokes"},
par: {$sum: "$tees.par"},
putts: {$sum: "$tees.putts"},
teesPlayed: {$sum:1}
}}
])
And what I want for a result
{
"_id" : "tdaYaSvXJueDq4oTN",
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3
// here comes what I want to add:
"fairway.straight": 1 // where tees.fairway equals "straight"
"fairway.right": 2 // where tees.fraiway equals "right" (etc.)
"shotType.bunker": 1 // where shotType equals "bunker" etc.
}

There are a few ways of approaching this depending on your overall needs and which MongoDB server version you have available as a target for your project.
Whilst "meteor" installations and default project setups do not "bundle" a MongoDB 3.2 instance, there is no need why your project cannot use such an instance as an external target. If it's a new project to get off the ground, then I would highly recommend working against the latest version available. And maybe even possibly against latest development releases, depending on your own targeted release cycle. Work with what is most fresh, and your application will be too.
For that reason, we start with the latest at the top of the list.
MongoDB 3.2 way - Fast
The big feature in MongoDB 3.2 that makes it really stand out here in terms of performance is a change in how $sum operates. Previously just as an accumulator operator for $group this would work on singular numeric values to produce a total.
The big improvement is hidden within the $project stage usage which is added, where $sum can be directly applied to an array of values. i.e { "$sum": [1,2,3] } results in 6. So now you can "nest" the operations with anything that produces an array of values from a source. Most notably here is $map:
db.analysis.aggregate([
{ "$group": {
"_id": "$player.id",
"strokes": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.strokes"
}
}
}
},
"par": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.par"
}
}
}
},
"putts": {
"$sum": {
"$sum": {
"$map": {
"input": "$tees",
"as": "tee",
"in": "$$tee.putts"
}
}
}
},
"teesPlayed": { "$sum": { "$size": "$tees" } },
"shotsRight": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.fairway", "right" ] }
}
}
}
},
"shotsStraight": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.fairway", "straight" ] }
}
}
}
},
"bunkerShot": {
"$sum": {
"$size": {
"$filter": {
"input": "$tees",
"as": "tee",
"cond": { "$eq": [ "$$tee.shotType", "bunker" ] }
}
}
}
}
}}
])
So here each field is split out by either doing the double $sum trick on the single field values from the array items, or in contrast the arrays are being processed with $filter to just restrict to matching items and processed for lenght of matches with $size, for the result fields that rather want "counts".
Though this looks long winded in pipeline construction it will yield the fasted results. And though you need to specify all of the keys to result with the associated logic, there is nothing stopping "generation" of the data structure necessary for the pipeline as the result of other queries on the data set.
The other Aggregate Way - A bit slower
Of course not every project can practically use the latest version of things. So before a MongoDB 3.2 release that introduced some of the operators used above, the only real practical way to work with array data and conditionally work with different elements and sums was to process first with $unwind.
So essentially we start with the query you began to construct, but then add in the handling for the different fields:
db.analysis.aggregate([
{ "$unwind": "$tees" },
{ "$group": {
"_id": "$player.id",
"strokes": { "$sum": "$tees.strokes" },
"par": { "$sum": "$tees.par" },
"putts": { "$sum": "$tees.putts" },
"teedsPlayed": { "$sum": 1 },
"shotsRight": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.fairway", "right" ] },
1,
0
]
}
},
"shotsStraight": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.fairway", "straight" ] },
1,
0
]
}
},
"bunkerShot": {
"$sum": {
"$cond": [
{ "$eq": [ "$tees.shotType", "bunker" ] },
1,
0
]
}
}
}}
])
So you should note that there is still "some" similarity to the first listing, in that where the $filter statements all have some logic within there "cond" argument, that logic is rather transposed to the $cond operator here.
As a "ternary" operator ( if/then/else) , it's job it is to evaluate a logical condition (if) and either return the next argument where that condition was true (then) or otherwise return the last argument where it is false (else). In this case either 1 or 0 depending on whether the tested condition matched. This gives the "counts" to $sum as is required.
In either statement, the produced results come out like this:
{
"_id" : "tdaYaSvXJueDq4oTN",
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3,
"shotsRight" : 2,
"shotsStraight" : 1,
"bunkerShot" : 1
}
Since this is an aggregate statement with $group, then one rule is that the "keys" ( apart from needing to be specified in the constructed statement ) must be in the "top-level" of the structure. So no "nested" structures are allowed within a $group, hence the whole names for each key.
If you really must transform, then you can by adding a $project stage following the $group in each example:
{ "$project": {
"strokes": 1,
"par": 1,
"putts": 1,
"teesPlayed": 1,
"fairway": {
"straight": "$shotsStraight",
"right": "$shotsRight"
},
"shotType": {
"bunker": "$bunkerShot"
}
}}
So a bit of "re-shaping" can be done, but of course all the names and structure must be specified, though again you could in theory just generate this all in code. It is just a data structure after all.
The bottom line here is that $unwind adds cost, and quite a lot of cost. It is basically going to add a copy of each document in the pipeline for processing "per" every array element contained in each document. So not is there only the cost of processing all of those produced things, but also a cost of "producing" them in the first place.
MapReduce - Slower still, but more flexible on the keys
And finally as an approach
db.analysis.mapReduce(
function() {
var data = { "strokes": 0 ,"par": 0, "putts": 0, "teesPlayed": 0, "fairway": {} };
this.tees.forEach(function(tee) {
// Increment common values
data.strokes += tee.strokes;
data.par += tee.par;
data.putts += tee.putts;
data.teesPlayed++;
// Do dynamic keys
if (!data.fairway.hasOwnProperty(tee.fairway))
data.fairway[tee.fairway] = 0;
data.fairway[tee.fairway]++;
if (tee.hasOwnProperty('shotType')) {
if (!data.hasOwnProperty('shotType'))
data.shotType = {};
if (!data.shotType.hasOwnProperty(tee.shotType))
data.shotType[tee.shotType] = 0;
data.shotType[tee.shotType]++
}
});
emit(this.player.id,data);
},
function(key,values) {
var data = { "strokes": 0 ,"par": 0, "putts": 0, "teesPlayed": 0, "fairway": {} };
values.forEach(function(value) {
// Common keys
data.strokes += value.strokes;
data.par += value.par;
data.putts += value.putts;
data.teesPlayed += value.teesPlayed;
Object.keys(value.fairway).forEach(function(fairway) {
if (!data.fairway.hasOwnProperty(fairway))
data.fairway[fairway] = 0;
data.fairway[fairway] += value.fairway[fairway];
});
if (value.hasOwnProperty('shotType')) {
if (!data.hasOwnProperty('shotType'))
data.shotType = {};
Object.keys(value.shotType).forEach(function(shotType) {
if (!data.shotType.hasOwnProperty(shotType))
data.shotType[shotType] = 0;
data.shotType[shotType] += value.shotType[shotType];
});
}
});
return data;
},
{ "out": { "inline": 1 } }
)
And the output from this can be done immediately with the nested structure, but of course in the very mapReduce output form of "key/value", being that "key" is the grouping _id and "value" contains all the output:
{
"_id" : "tdaYaSvXJueDq4oTN",
"value" : {
"strokes" : 15,
"par" : 14,
"putts" : 7,
"teesPlayed" : 3,
"fairway" : {
"straight" : 1,
"right" : 2
},
"shotType" : {
"bunker" : 1
}
}
}
The "out" options for mapReduce are either the "inline" as shown here where you can fit all the result in memory ( and within the 16MB BSON limit ), or alternately to another collection from which you can read later. There is a similar $out for .aggregate(), but this is generally negated by aggregation output being available as a "cursor", unless of course you really want it in a collection instead.
Concluding
So it all depends on how you really want to approach this. If speed is of the upmost importance then .aggregate() is generally going to yield the fastest results. On the other hand if you want to work "dynamically" with the produced "keys" then mapReduce allows the logic to be generally self contained, without the need for another inspection pass to generate the required aggregation pipeline statement.

I am not clear how to do that through aggregation, however, there is one work around in this way
> db.collection.find({}).forEach(function(doc) {
var ret = {};
ret._id = doc._id;
doc.tees.forEach(function(obj) {
for (var k in obj) {
var type = typeof obj[k];
if (type === 'number') {
if (ret.hasOwnProperty(k)) {
ret[k] += obj[k];
} else {
ret[k] = obj[k];
}
} else if (type === 'string') {
if (ret.hasOwnProperty(k+'.'+obj[k])) {
ret[k+'.'+obj[k]] += 1;
} else {
ret[k+'.'+obj[k]] = 1;
}
}
}
});
printjson(ret);
});

Related

MongoDB Property String Starts With Query And Set Group Id

Data structure - one document in a big collection:
{
OPERATINGSYSTEM: "Android 6.0"
}
Issue: The operatingsystem can equal e.g. "Android 5.0", "Android 6.0", "Windows Phone", "Windows Phone 8.1"
There is no property which only contains the kind of operating system e.g. only Android.
I need to get the count of windows phones, and android phones.
My temporary solution:
db.getCollection('RB').find(
{OPERATINGSYSTEM: {$regex: "^Android"}}
).count();
I'm doing that query replacing "^Android" by windows phone and so on which takes much time and needs to be done in parallel.
Using the aggregation framework I though about this:
db.RB.aggregate(
{$group: {_id: {OPERATINGSYSTEM:"$OPERATINGSYSTEM"}}},)
But using this I get an entry for each operatingsystem version Android 5.0, Android 6.0 etc...
The solution I'm searching for should return data in this format:
{
"Android": 50,
"Windows Phone": 100
}
How can this be done in a single query?
Provided your strings at least consistently have the numeric version as the last thing in the string, then you could use $split with the aggregation framework to make an array from the "space delimited" content, then remove the last element from the array before reconstructing:
Given data like :
{ "name" : "Android 6.0" }
{ "name" : "Android 7.0" }
{ "name" : "Windows Phone 10" }
You can try:
db.getCollection('phones').aggregate([
{ "$group": {
"_id": {
"$let": {
"vars": { "split": { "$split": [ "$name", " " ] } },
"in": {
"$reduce": {
"input": { "$slice": [ "$$split", 0, { "$subtract": [ { "$size": "$$split" }, 1 ] } ] },
"initialValue": "",
"in": {
"$cond": {
"if": { "$eq": [ "$$value", "" ] },
"then": "$$this",
"else": { "$concat": [ "$$value", " ", "$$this" ] }
}
}
}
}
}
},
"count": { "$sum": 1 }
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": [[{ "k": "$_id", "v": "$count" }]]
}
}}
])
That's all possible if your MongoDB is at least MongoDB 3.4 to support both $split and $reduce. The $replaceRoot is really about naming the keys, and not really required.
Alternately you can use mapReduce:
db.getCollection('phones').mapReduce(
function() {
var re = /\d+/g;
emit(this.name.substr(0,this.name.search(re)-1),1);
},
function(key,values) { return Array.sum(values) },
{ "out": { "inline": 1 } }
)
Where it's easier to break down the string by the index where a numeric value occurs. In either case, you are not required to "hardcode" anything, and the values of the keys are completely dependent on the strings in context.
Keep in mind though that unless there is an extremely large number of possible values, then running parallel .count() operations "should" be the fastest to process since returning cursor counts is a lot faster than actually counting the aggregated entries.
You can use map reduce, and apply your logic in the map function.
var map = function(){
var name = this.op.includes("android") ? "Android" : ""; // could be a regexp
if(name === ""){
name = this.op.includes("windows") ? "Windows" : "";
}
emit(name, 1);
}
var reduce = function(key, values){
return Array.sum(values)
}
db.operating.mapReduce(map, reduce, {out: "total"})
https://docs.mongodb.com/manual/tutorial/map-reduce-examples/

How can I unset all document properties except for one or two in mongodb?

I have a collection of documents about entities that have status property that could be 1 or 0. Every document contains a lot of data and occupies space.
I want to get rid of most of the data on the documents with status equal 0.
So, I want every document in the collection that looks like
{
_id: 234,
myCode: 101,
name: "sfsdf",
status: 0,
and: 23243423.1,
a: "dsf",
lot: 3234,
more: "efsfs",
properties: "sdfsd"
}
...to be a lot smaller
{
_id: 234,
mycode: 101,
status: 0
}
So, basically I can do
db.getCollection('docs').update(
{'statusCode': 0},
{
$unset: {
and: "",
a: "",
lot: "",
more: "",
properties: ""
}
},
{multi:true}
)
But there are about 40 properties which would be a huge list, and also I'm not sure that all the objects follow the same schema.
Is there a way to unset all except two properties?
The best thing to do here is to actually throw all the possible properties to $unset and let it do it's job. You cannot "wildcard" such arguments so there really is not a better way without writing to another collection.
If you don't want to type them all out or even know all of them, then simply perform a process to "collect" all the other top level properties.
You can do this for example with .mapReduce():
var fields = db.getCollection('docs').mapReduce(
function() {
Object.keys(this)
.filter(k => k !== '_id' && k !== 'myCode')
.forEach( k => emit(k,1) )
},
function() {},
{
"out": { "inline": 1 }
}
).results.map( o => o._id )
.reduce((acc,curr) => Object.assign(acc,{ [curr]: "" }),{})
Gives you an object with the full fields list to provide to $unset as:
{
"a" : "",
"and" : "",
"lot" : "",
"more" : "",
"name" : "",
"properties" : "",
"status" : ""
}
And that is taken from all possible top level fields in the whole collection.
You can do the same thing with .aggregate() in MongoDB 3.4 using $objectToArray:
var fields = db.getCollection('docs').aggregate([
{ "$project": {
"fields": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "d",
"cond": {
"$and": [
{ "$ne": [ "$$d.k", "_id" ] },
{ "$ne": [ "$$d.k", "myCode" ] }
]
}
}
}
}},
{ "$unwind": "$fields" },
{ "$group": {
"_id": "$fields.k"
}}
]).map( o => o._id )
.reduce((acc,curr) => Object.assign(acc,{ [curr]: "" }),{});
Whatever way you obtain the list of names, then simply send them to $unset:
db.getCollection('docs').update(
{ "statusCode": 0 },
{ "$unset": fields },
{ "multi": true }
)
Bottom like is that $unset does not care if the properties are present in the document or not, but will simply remove them where they exist.
The alternate case is to simply write everything out to a new collection if that also suits your needs. This is a simple use of $out as an aggregation pipeline stage:
db.getCollection('docs').aggregate([
{ "$match": { "statusCode": 0 } },
{ "$project": { "myCode": 1 } },
{ "$out": "newdocs" }
])

Most efficient way to change a string field value to its substring

I have a collection filled with documents that look like this:
{
data: 11,
version: "0.0.32"
}
and some have a test suffix to version:
{
data: 55,
version: "0.0.42-test"
}
The version field has different values but it always conforms to the pattern: 0.0.XXX. I would like to update all the documents to look like this:
{
data: 11,
version: 32
}
and the suffixed version (for test documents - version should be negative):
{
data: 55,
version: -42
}
The collection with these documents is used by our critical system, that needs to be turned off while updating the data - so I want the update/change to be as fast as possible. There are about 66_000_000 documents in this collection, and it's about 100GB in size.
Which type of mongodb operation would be the most efficient one?
The most efficient way to do this is in the upcoming release of MongoDB as of this writing using the $split operator to split our string as shown here then assign the last element in the array to a variable using the $let variable operator and the $arrayElemAt operators.
Next, we use the $switch operator to perform a logical condition processing or case statement against that variable.
The condition here is $gt which returns true if the value contains "test", and in which case in the in expression we split that string and simply return the $concatenated value of the first element in the newly computed array and the -. If the condition evaluates to false, we just return the variable.
Of course in our case statement, we use the $indexOfCP which returns -1 if there were no occurrences of "test".
let cursor = db.collection.aggregate(
[
{ "$project": {
"data": 1,
"version": {
"$let": {
"vars": {
"v": {
"$arrayElemAt": [
{ "$split": [ "$version", "." ] },
-1
]
}
},
"in": {
"$switch": {
"branches": [
{
"case": {
"$gt": [
{ "$indexOfCP": [ "$$v", "test" ] },
-1
]
},
"then": {
"$concat": [
"-",
"",
{ "$arrayElemAt": [
{ "$split": [ "$$v", "-" ] },
0
]}
]
}
}
],
"default": "$$v"
}
}
}
}
}}
]
)
The aggregation query produces something like this:
{ "_id" : ObjectId("57a98773cbbd42a2156260d8"), "data" : 11, "version" : "32" }
{ "_id" : ObjectId("57a98773cbbd42a2156260d9"), "data" : 55, "version" : "-42" }
As you can see, the "version" field data are string. If the data type for that field does not matter then, you can simply use the $out aggregation pipeline stage operator to write the result into a new collection or replace your collection.
{ "out": "collection" }
If you need to convert your data to floating point number then, the only way to do this, simply because MongoDB doesn't not provides a way to do type conversion out of the box except for integer to string, is to iterate the aggregation Cursor object and convert your value using parseFloat or Number then update your documents using the $set operator and the bulkWrite() method for maximum efficiency.
let requests = [];
cursor.forEach(doc => {
requests.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"data": doc.data,
"version": parseFloat(doc.version)
},
"$unset": { "person": " " }
}
}
});
if ( requests.length === 1000 ) {
// Execute per 1000 ops and re-init
db.collection.bulkWrite(requests);
requests = [];
}}
);
// Clean up queues
if(requests.length > 0) {
db.coll.bulkWrite(requests);
}
While the aggregation query will perfectly work in MongoDB 3.4 or newer our best bet from MongoDB 3.2 backwards is mapReduce with the bulkWrite() method.
var results = db.collection.mapReduce(
function() {
var v = this.version.split(".")[2];
emit(this._id, v.indexOf("-") > -1 ? "-"+v.replace(/\D+/g, '') : v)
},
function(key, value) {},
{ "out": { "inline": 1 } }
)["results"];
results looks like this:
[
{
"_id" : ObjectId("57a98773cbbd42a2156260d8"),
"value" : "32"
},
{
"_id" : ObjectId("57a98773cbbd42a2156260d9"),
"value" : "-42"
}
]
From here you use the previous .forEach loop to update your documents.
From MongoDB 2.6 to 3.0 you will need to use the now deprecated Bulk() API and it associated method as show in my answer here.

Return only matched sub-document elements within a nested array

The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $ operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores" array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers", and as such every "offer" within the matchd "stores" array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter operation:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.x and above with $map and $setDifference:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
MongoDB 3.2.x and greater
So generally speaking, $filter is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers" within "stores" to examime and $filter that content.
The simple comparison here is "Does the "size" array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset operation to compare an array ("set") of ["L"] to the target array. Where that condition is true ( it contains "L" ) then the array element for "offers" is retained and returned in the result.
In the higher level $filter, you are then looking to see if the result from that previous $filter returned an empty array [] for "offers". If it is not empty, then the element is returned or otherwise it is removed.
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter in this version you can use $map to inspect each element and then use $setDifference to filter out any elements that were returned as false.
So $map is going to return the whole array, but the $cond operation just decides whether to return the element or instead a false value. In the comparison of $setDifference to a single element "set" of [false] all false elements in the returned array would be removed.
In all other ways, the logic is the same as above.
MongoDB 2.2.x and up
So below MongoDB 2.6 the only tool for working with arrays is $unwind, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
Conclusion
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find().
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match and $project, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints.
Have a fun!
It's also works without aggregate.
here is the solution link:https://mongoplayground.net/p/Q5lxPvGK03A
db.collection.find({
"stores.offers.size": "L"
},
{
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [
[
"L"
],
"$$offer.size"
]
}
}
}
}
}
},
"as": "store",
"cond": {
"$ne": [
"$$store.offers",
[]
]
}
}
}
})

How to select custom data in mongo

Is there a way how to include custom data in the mongo query response?
What I mean is a mongo alternative for something like this in MySQL code:
SELECT
value,
'7' AS min_value
FORM
my_table
WHERE
value >= 7
...while the 7 should probably be a variable in the language where the mongo query is being called from.
Try the $literal operator if using the aggregation framework with a $match pipeline step as your query filter. For example, create a sample collection in mongo shell that has 10 test documents with the value field as an increasing integer (0 to 9):
for(x=0;x<10;x++){ db.my_table.insert({value: x }) }
Running the following aggregation pipeline:
var base = 7;
db.my_table.aggregate([
{
"$match": {
"value": { "$gte": base }
}
},
{
"$project": {
"value": 1,
"min_value": { "$literal": base }
}
}
])
would produce the result:
/* 0 */
{
"result" : [
{
"_id" : ObjectId("561e2bcc3d8f561c1548d39b"),
"value" : 7,
"min_value" : 7
},
{
"_id" : ObjectId("561e2bcc3d8f561c1548d39c"),
"value" : 8,
"min_value" : 7
},
{
"_id" : ObjectId("561e2bcc3d8f561c1548d39d"),
"value" : 9,
"min_value" : 7
}
],
"ok" : 1
}
The only things in MongoDB query actions that actuallly "modify" the results returned other than the original document or "field selection" are the .aggregate() method or the JavaScript manipulation alternate in mapReduce.
Otherwise documents are returned "as is", or at least with just the selected fields or array entry specified.
So if you want something else returned from the server, then you need to use one of those methods:
var seven = 7;
db.collection.aggregate([
{ "$match": {
"value": { "$gt": seven }
}},
{ "$project": {
"value": 1,
"min_value": { "$literal": seven }
}}
])
Where the $literal operator comes into play, or in versions prior to 2.6 and greater or equal to 2.2 ( aggregation framework introduced ) can use $const instead:
var seven = 7;
db.collection.aggregate([
{ "$match": {
"value": { "$gt": seven }
}},
{ "$project": {
"value": 1,
"min_value": { "$const": seven }
}}
])
Or just use mapReduce and it's JavaScript translation:
var seven = 7;
db.mapReduce(
function() {
emit(this._id,{ "value": this.value, "min_value": seven });
},
function() {}, // no reduce at all since all _id unique
{
"out": { "inline": 1 },
"query": { "value": { "$gt": seven } },
"scope": { "seven": seven }
}
);
Those are basically your options.