Return only matched sub-document elements within a nested array - mongodb

The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.

So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $ operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores" array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers", and as such every "offer" within the matchd "stores" array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter operation:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.x and above with $map and $setDifference:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
MongoDB 3.2.x and greater
So generally speaking, $filter is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers" within "stores" to examime and $filter that content.
The simple comparison here is "Does the "size" array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset operation to compare an array ("set") of ["L"] to the target array. Where that condition is true ( it contains "L" ) then the array element for "offers" is retained and returned in the result.
In the higher level $filter, you are then looking to see if the result from that previous $filter returned an empty array [] for "offers". If it is not empty, then the element is returned or otherwise it is removed.
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter in this version you can use $map to inspect each element and then use $setDifference to filter out any elements that were returned as false.
So $map is going to return the whole array, but the $cond operation just decides whether to return the element or instead a false value. In the comparison of $setDifference to a single element "set" of [false] all false elements in the returned array would be removed.
In all other ways, the logic is the same as above.
MongoDB 2.2.x and up
So below MongoDB 2.6 the only tool for working with arrays is $unwind, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
Conclusion
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find().
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match and $project, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}

as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints.
Have a fun!

It's also works without aggregate.
here is the solution link:https://mongoplayground.net/p/Q5lxPvGK03A
db.collection.find({
"stores.offers.size": "L"
},
{
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [
[
"L"
],
"$$offer.size"
]
}
}
}
}
}
},
"as": "store",
"cond": {
"$ne": [
"$$store.offers",
[]
]
}
}
}
})

Related

How to perform operations in pipeline in mongo $lookup [duplicate]

I have the following collections:
venue collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf86"),
"name" : "ASA College - Manhattan Campus",
"addedBy" : ObjectId("5ac8ba3582c2345af70d4658"),
"reviews" : [
ObjectId("5acdb8f65ea63a27c1facf8b"),
ObjectId("5ad8288ccdd9241781dce698")
]
}
reviews collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"createdAt" : ISODate("2018-04-07T12:31:49.503Z"),
"venue" : ObjectId("5acdb8f65ea63a27c1facf86"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"content" : "nice place",
"comments" : [
ObjectId("5ad87113882d445c5cbc92c8")
]
}
comment collection
{ "_id" : ObjectId("5ad87113882d445c5cbc92c8"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"comment" : "dcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsf",
"review" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"__v" : 0
}
author collection
{ "_id" : ObjectId("5ac8ba3582c2345af70d4658"),
"firstName" : "Bruce",
"lastName" : "Wayne",
"email" : "bruce#linkites.com",
"followers" : [ObjectId("5ac8b91482c2345af70d4650")]
}
Now the following populate query works fine
const venues = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'author' },
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
However, I want to achieve it with $lookup query, but it splits the venue when I am doing '$unwind' to the reviews... I want reviews in same array (like populate) and in same order...
I want to achieve following query with $lookup because author have followers field so I need to send field isFollow by doing $project which cannot be done using populate...
$project: {
isFollow: { $in: [mongoose.Types.ObjectId(req.user.id), '$followers'] }
}
There are a couple of approaches of course depending on your available MongoDB version. These vary from different usages of $lookup through to enabling object manipulation on the .populate() result via .lean().
I do ask that you read the sections carefully, and be aware that all may not be as it seems when considering your implementation solution.
MongoDB 3.6, "nested" $lookup
With MongoDB 3.6 the $lookup operator gets the additional ability to include a pipeline expression as opposed to simply joining a "local" to "foreign" key value, what this means is you can essentially do each $lookup as "nested" within these pipeline expressions
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"let": { "reviews": "$reviews" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$reviews" ] } } },
{ "$lookup": {
"from": Comment.collection.name,
"let": { "comments": "$comments" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$comments" ] } } },
{ "$lookup": {
"from": Author.collection.name,
"let": { "author": "$author" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$_id", "$$author" ] } } },
{ "$addFields": {
"isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$followers"
]
}
}}
],
"as": "author"
}},
{ "$addFields": {
"author": { "$arrayElemAt": [ "$author", 0 ] }
}}
],
"as": "comments"
}},
{ "$sort": { "createdAt": -1 } }
],
"as": "reviews"
}},
])
This can be really quite powerful, as you see from the perspective of the original pipeline, it really only knows about adding content to the "reviews" array and then each subsequent "nested" pipeline expression also only ever sees it's "inner" elements from the join.
It is powerful and in some respects it may be a bit clearer as all field paths are relative to the nesting level, but it does start that indentation creep in the BSON structure, and you do need to be aware of whether you are matching to arrays or singular values in traversing the structure.
Note we can also do things here like "flattening the author property" as seen within the "comments" array entries. All $lookup target output may be an "array", but within a "sub-pipeline" we can re-shape that single element array into just a single value.
Standard MongoDB $lookup
Still keeping the "join on the server" you can actually do it with $lookup, but it just takes intermediate processing. This is the long standing approach with deconstructing an array with $unwind and the using $group stages to rebuild arrays:
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"localField": "reviews",
"foreignField": "_id",
"as": "reviews"
}},
{ "$unwind": "$reviews" },
{ "$lookup": {
"from": Comment.collection.name,
"localField": "reviews.comments",
"foreignField": "_id",
"as": "reviews.comments",
}},
{ "$unwind": "$reviews.comments" },
{ "$lookup": {
"from": Author.collection.name,
"localField": "reviews.comments.author",
"foreignField": "_id",
"as": "reviews.comments.author"
}},
{ "$unwind": "$reviews.comments.author" },
{ "$addFields": {
"reviews.comments.author.isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$reviews.comments.author.followers"
]
}
}},
{ "$group": {
"_id": {
"_id": "$_id",
"reviewId": "$review._id"
},
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"review": {
"$first": {
"_id": "$review._id",
"createdAt": "$review.createdAt",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content"
}
},
"comments": { "$push": "$reviews.comments" }
}},
{ "$sort": { "_id._id": 1, "review.createdAt": -1 } },
{ "$group": {
"_id": "$_id._id",
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"reviews": {
"$push": {
"_id": "$review._id",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content",
"comments": "$comments"
}
}
}}
])
This really is not as daunting as you might think at first and follows a simple pattern of $lookup and $unwind as you progress through each array.
The "author" detail of course is singular, so once that is "unwound" you simply want to leave it that way, make the field addition and start the process of "rolling back" into the arrays.
There are only two levels to reconstruct back to the original Venue document, so the first detail level is by Review to rebuild the "comments" array. All you need to is to $push the path of "$reviews.comments" in order to collect these, and as long as the "$reviews._id" field is in the "grouping _id" the only other things you need to keep are all the other fields. You can put all of these into the _id as well, or you can use $first.
With that done there is only one more $group stage in order to get back to Venue itself. This time the grouping key is "$_id" of course, with all properties of the venue itself using $first and the remaining "$review" details going back into an array with $push. Of course the "$comments" output from the previous $group becomes the "review.comments" path.
Working on a single document and it's relations, this is not really so bad. The $unwind pipeline operator can generally be a performance issue, but in the context of this usage it should not really cause that much of an impact.
Since the data is still being "joined on the server" there is still far less traffic than the other remaining alternative.
JavaScript Manipulation
Of course the other case here is that instead of changing data on the server itself, you actually manipulate the result. In most cases I would be in favor of this approach since any "additions" to the data are probably best handled on the client.
The problem of course with using populate() is that whilst it may 'look like' a much more simplified process, it is in fact NOT A JOIN in any way. All populate() actually does is "hide" the underlying process of submitting multiple queries to the database, and then awaiting the results through async handling.
So the "appearance" of a join is actually the result of multiple requests to the server and then doing "client side manipulation" of the data to embed the details within arrays.
So aside from that clear warning that the performance characteristics are nowhere close to being on par with a server $lookup, the other caveat is of course that the "mongoose Documents" in the result are not actually plain JavaScript objects subject to further manipulation.
So in order to take this approach, you need to add the .lean() method to the query before execution, in order to instruct mongoose to return "plain JavaScript objects" instead of Document types which are cast with schema methods attached to the model. Noting of course that the resulting data no longer has access to any "instance methods" that would otherwise be associated with the related models themselves:
let venue = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
.lean();
Now venue is a plain object, we can simply process and adjust as needed:
venue.reviews = venue.reviews.map( r =>
({
...r,
comments: r.comments.map( c =>
({
...c,
author: {
...c.author,
isAuthor: c.author.followers.map( f => f.toString() ).indexOf(req.user.id) != -1
}
})
)
})
);
So it's really just a matter of cycling through each of the inner arrays down until the level where you can see the followers array within the author details. The comparison then can be made against the ObjectId values stored in that array after first using .map() to return the "string" values for comparison against the req.user.id which is also a string (if it is not, then also add .toString() on that ), since it is easier in general to compare these values in this way via JavaScript code.
Again though I need to stress that it "looks simple" but it is in fact the sort of thing you really want to avoid for system performance, as those additional queries and the transfer between the server and the client cost a lot in time of processing and even due to the request overhead this adds up to real costs in transport between hosting providers.
Summary
Those are basically your approaches you can take, short of "rolling your own" where you actually perform the "multiple queries" to the database yourself instead of using the helper that .populate() is.
Using the populate output, you can then simply manipulate the data in result just like any other data structure, as long as you apply .lean() to the query to convert or otherwise extract the plain object data from the mongoose documents returned.
Whilst the aggregate approaches look far more involved, there are "a lot" more advantages to doing this work on the server. Larger result sets can be sorted, calculations can be done for further filtering, and of course you get a "single response" to a "single request" made to the server, all with no additional overhead.
It is totally arguable that the pipelines themselves could simply be constructed based on attributes already stored on the schema. So writing your own method to perform this "construction" based on the attached schema should not be too difficult.
In the longer term of course $lookup is the better solution, but you'll probably need to put a little more work into the initial coding, if of course you don't just simply copy from what is listed here ;)

MongoDB lookup 3 collections and nesting the collections [duplicate]

I have the following collections:
venue collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf86"),
"name" : "ASA College - Manhattan Campus",
"addedBy" : ObjectId("5ac8ba3582c2345af70d4658"),
"reviews" : [
ObjectId("5acdb8f65ea63a27c1facf8b"),
ObjectId("5ad8288ccdd9241781dce698")
]
}
reviews collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"createdAt" : ISODate("2018-04-07T12:31:49.503Z"),
"venue" : ObjectId("5acdb8f65ea63a27c1facf86"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"content" : "nice place",
"comments" : [
ObjectId("5ad87113882d445c5cbc92c8")
]
}
comment collection
{ "_id" : ObjectId("5ad87113882d445c5cbc92c8"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"comment" : "dcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsf",
"review" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"__v" : 0
}
author collection
{ "_id" : ObjectId("5ac8ba3582c2345af70d4658"),
"firstName" : "Bruce",
"lastName" : "Wayne",
"email" : "bruce#linkites.com",
"followers" : [ObjectId("5ac8b91482c2345af70d4650")]
}
Now the following populate query works fine
const venues = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'author' },
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
However, I want to achieve it with $lookup query, but it splits the venue when I am doing '$unwind' to the reviews... I want reviews in same array (like populate) and in same order...
I want to achieve following query with $lookup because author have followers field so I need to send field isFollow by doing $project which cannot be done using populate...
$project: {
isFollow: { $in: [mongoose.Types.ObjectId(req.user.id), '$followers'] }
}
There are a couple of approaches of course depending on your available MongoDB version. These vary from different usages of $lookup through to enabling object manipulation on the .populate() result via .lean().
I do ask that you read the sections carefully, and be aware that all may not be as it seems when considering your implementation solution.
MongoDB 3.6, "nested" $lookup
With MongoDB 3.6 the $lookup operator gets the additional ability to include a pipeline expression as opposed to simply joining a "local" to "foreign" key value, what this means is you can essentially do each $lookup as "nested" within these pipeline expressions
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"let": { "reviews": "$reviews" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$reviews" ] } } },
{ "$lookup": {
"from": Comment.collection.name,
"let": { "comments": "$comments" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$comments" ] } } },
{ "$lookup": {
"from": Author.collection.name,
"let": { "author": "$author" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$_id", "$$author" ] } } },
{ "$addFields": {
"isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$followers"
]
}
}}
],
"as": "author"
}},
{ "$addFields": {
"author": { "$arrayElemAt": [ "$author", 0 ] }
}}
],
"as": "comments"
}},
{ "$sort": { "createdAt": -1 } }
],
"as": "reviews"
}},
])
This can be really quite powerful, as you see from the perspective of the original pipeline, it really only knows about adding content to the "reviews" array and then each subsequent "nested" pipeline expression also only ever sees it's "inner" elements from the join.
It is powerful and in some respects it may be a bit clearer as all field paths are relative to the nesting level, but it does start that indentation creep in the BSON structure, and you do need to be aware of whether you are matching to arrays or singular values in traversing the structure.
Note we can also do things here like "flattening the author property" as seen within the "comments" array entries. All $lookup target output may be an "array", but within a "sub-pipeline" we can re-shape that single element array into just a single value.
Standard MongoDB $lookup
Still keeping the "join on the server" you can actually do it with $lookup, but it just takes intermediate processing. This is the long standing approach with deconstructing an array with $unwind and the using $group stages to rebuild arrays:
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"localField": "reviews",
"foreignField": "_id",
"as": "reviews"
}},
{ "$unwind": "$reviews" },
{ "$lookup": {
"from": Comment.collection.name,
"localField": "reviews.comments",
"foreignField": "_id",
"as": "reviews.comments",
}},
{ "$unwind": "$reviews.comments" },
{ "$lookup": {
"from": Author.collection.name,
"localField": "reviews.comments.author",
"foreignField": "_id",
"as": "reviews.comments.author"
}},
{ "$unwind": "$reviews.comments.author" },
{ "$addFields": {
"reviews.comments.author.isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$reviews.comments.author.followers"
]
}
}},
{ "$group": {
"_id": {
"_id": "$_id",
"reviewId": "$review._id"
},
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"review": {
"$first": {
"_id": "$review._id",
"createdAt": "$review.createdAt",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content"
}
},
"comments": { "$push": "$reviews.comments" }
}},
{ "$sort": { "_id._id": 1, "review.createdAt": -1 } },
{ "$group": {
"_id": "$_id._id",
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"reviews": {
"$push": {
"_id": "$review._id",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content",
"comments": "$comments"
}
}
}}
])
This really is not as daunting as you might think at first and follows a simple pattern of $lookup and $unwind as you progress through each array.
The "author" detail of course is singular, so once that is "unwound" you simply want to leave it that way, make the field addition and start the process of "rolling back" into the arrays.
There are only two levels to reconstruct back to the original Venue document, so the first detail level is by Review to rebuild the "comments" array. All you need to is to $push the path of "$reviews.comments" in order to collect these, and as long as the "$reviews._id" field is in the "grouping _id" the only other things you need to keep are all the other fields. You can put all of these into the _id as well, or you can use $first.
With that done there is only one more $group stage in order to get back to Venue itself. This time the grouping key is "$_id" of course, with all properties of the venue itself using $first and the remaining "$review" details going back into an array with $push. Of course the "$comments" output from the previous $group becomes the "review.comments" path.
Working on a single document and it's relations, this is not really so bad. The $unwind pipeline operator can generally be a performance issue, but in the context of this usage it should not really cause that much of an impact.
Since the data is still being "joined on the server" there is still far less traffic than the other remaining alternative.
JavaScript Manipulation
Of course the other case here is that instead of changing data on the server itself, you actually manipulate the result. In most cases I would be in favor of this approach since any "additions" to the data are probably best handled on the client.
The problem of course with using populate() is that whilst it may 'look like' a much more simplified process, it is in fact NOT A JOIN in any way. All populate() actually does is "hide" the underlying process of submitting multiple queries to the database, and then awaiting the results through async handling.
So the "appearance" of a join is actually the result of multiple requests to the server and then doing "client side manipulation" of the data to embed the details within arrays.
So aside from that clear warning that the performance characteristics are nowhere close to being on par with a server $lookup, the other caveat is of course that the "mongoose Documents" in the result are not actually plain JavaScript objects subject to further manipulation.
So in order to take this approach, you need to add the .lean() method to the query before execution, in order to instruct mongoose to return "plain JavaScript objects" instead of Document types which are cast with schema methods attached to the model. Noting of course that the resulting data no longer has access to any "instance methods" that would otherwise be associated with the related models themselves:
let venue = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
.lean();
Now venue is a plain object, we can simply process and adjust as needed:
venue.reviews = venue.reviews.map( r =>
({
...r,
comments: r.comments.map( c =>
({
...c,
author: {
...c.author,
isAuthor: c.author.followers.map( f => f.toString() ).indexOf(req.user.id) != -1
}
})
)
})
);
So it's really just a matter of cycling through each of the inner arrays down until the level where you can see the followers array within the author details. The comparison then can be made against the ObjectId values stored in that array after first using .map() to return the "string" values for comparison against the req.user.id which is also a string (if it is not, then also add .toString() on that ), since it is easier in general to compare these values in this way via JavaScript code.
Again though I need to stress that it "looks simple" but it is in fact the sort of thing you really want to avoid for system performance, as those additional queries and the transfer between the server and the client cost a lot in time of processing and even due to the request overhead this adds up to real costs in transport between hosting providers.
Summary
Those are basically your approaches you can take, short of "rolling your own" where you actually perform the "multiple queries" to the database yourself instead of using the helper that .populate() is.
Using the populate output, you can then simply manipulate the data in result just like any other data structure, as long as you apply .lean() to the query to convert or otherwise extract the plain object data from the mongoose documents returned.
Whilst the aggregate approaches look far more involved, there are "a lot" more advantages to doing this work on the server. Larger result sets can be sorted, calculations can be done for further filtering, and of course you get a "single response" to a "single request" made to the server, all with no additional overhead.
It is totally arguable that the pipelines themselves could simply be constructed based on attributes already stored on the schema. So writing your own method to perform this "construction" based on the attached schema should not be too difficult.
In the longer term of course $lookup is the better solution, but you'll probably need to put a little more work into the initial coding, if of course you don't just simply copy from what is listed here ;)

MongoDB: Why $literal required ? And where it can be used?

I have gone through MongoDB $literal in Aggregation framework, but I don't understand where it could be used ? more importantly, why it is required ?
Example from official MongoDB documentation,
db.records.aggregate( [
{ $project: { costsOneDollar: { $eq: [ "$price", { $literal: "$1" } ] } } }
])
Instead of the above example using $literal, why can't I use as below ?
db.records.aggregate( [
{ $project: { costsOneDollar: { $eq: [ "$price", "$1" ] } } }
] )
Also provide some other example which shows the best(or effective) usage of $literal.
For your basic case I think the documentation is fairly self explanatory:
In expression, the dollar sign $ evaluates to a field path; i.e. provides access to the field. For example, the $eq expression $eq: [ "$price", "$1" ] performs an equality check between the value in the field named price and the value in the field named 1 in the document.
So since $ is reserved for evaluation of field path values within the document, then this would be considered to acutally be looking for a "field" named 1 within the document. So the actual comparsion would likely be between the field named "price" and since there is no field named "1" then this would be treated as null and therefore false for every document.
On the other hand where the field "price" actually has a value equal to "$1", then the usage of $literal allows that "value" ( and not the field path reference ) to be considered. Hence "literal".
The operator has actually been around for some time ( since MongoDB 2.2 actually ) but under the guise of $const, which though not doucmented is still the basic operator, and $literal is really just an "alias" for that.
The usage mainly is and always has been to use where an expression is required to have some "specific value" as instructed within the pipeline. Take this simple statement:
{ "$project": { "myField": "one" } }
So for any number of reasons you might want to do that, and basically return a "literal" value in such a statement. But if you tried, it would result in a error as it essentially does not resolve to either a "field path" or a boolean condition for field selection, as is required here. So if you instead use:
{ "$project": { "myField": { "$literal": "one" } } }
Then you have "myField" with a value of "one" just like you asked for.
Other usages are more historic, such as:
{ "$project": { "array": { "$literal": ["A","B","C" ] } } },
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"trans": { "$push": {
"$cond": [
{ "$eq": [ "$array", "A" ] },
"$fieldA",
{ "$cond": [
{ "$eq": [ "$array", "B" ] },
"$fieldB",
"$fieldC"
]}
]
}}
}}
Which might more modernly be replaced with something like:
{ "$project": {
"trans": {
"$map": {
"input": ["A","B","C"],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$fieldA",
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
"$fieldB",
"$fieldC"
]}
]
}
}
}
}}
As a construct to move selected fields into an array based on position, with the difference being that as "array" and a field assignment the $literal is necessary, but as the "input" argument the plain array notation is just fine.
So the general cases are:
Where something reserved such as $ is needed as the value to match
Where there is a specific value to inject as a field assignment, and not as an argument to another operator expression.
The $1 example you give would try and compare the price field with the 1 field. By specifying the $literal operator, you're telling MongoDB that it is the exact string "$1". The same might be true if you wanted to use a MongoDB function name as a field name in your code, or even using a query snippet as a field value.

How to search embedded array

I want to get all matching values, using $elemMatch.
// create test data
db.foo.insert({values:[0,1,2,3,4,5,6,7,8,9]})
db.foo.find({},{
'values':{
'$elemMatch':{
'$gt':3
}
}
}) ;
My expecected result is {values:[3,4,5,6,7,8,9]} . but , really result is {values:[4]}.
I read mongo document , I understand this is specification.
How do I search for multi values ?
And more, I use 'skip' and 'limit'.
Any idea ?
Using Aggregation:
db.foo.aggregate([
{$unwind:"$values"},
{$match:{"values":{$gt:3}}},
{$group:{"_id":"$_id","values":{$push:"$values"}}}
])
You can add further filter condition in the $match, if you would like to.
You can't achieve this using an $elemMatch operator since, mongoDB doc says:
The $elemMatch projection operator limits the contents of an array
field that is included in the query results to contain only the array
element that matches the $elemMatch condition.
Note
The elements of the array are documents.
If you look carefully at the documentation on $elemMatch or the counterpart to query of the positional $ operator then you would see that only the "first" matched element is returned by this type of "projection".
What you are looking for is actually "manipulation" of the document contents where you want to "filter" the content of the array in the document rather than return the original or "matched" element, as there can be only one match.
For true "filtering" you need the aggregation framework, as there is more support there for document manipulation:
db.foo.aggregate([
// No point selecting documents that do not match your condition
{ "$match": { "values": { "$gt": 3 } } },
// Unwind the array to de-normalize as documents
{ "$unwind": "$values },
// Match to "filter" the array
{ "$match": { "values": { "$gt": 3 } } },
// Group by to the array form
{ "$group": {
"_id": "$_id",
"values": { "$push": "$values" }
}}
])
Or with modern versions of MongoDB from 2.6 and onwards, where the array values are "unique" you could do this:
db.foo.aggregate([
{ "$project": {
"values": {
"$setDifference": [
{ "$map": {
"input": "$values",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el", 3 ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])

MongoDB Nested Array Intersection Query

and thank you in advance for your help.
I have a mongoDB database structured like this:
{
'_id' : objectID(...),
'userID' : id,
'movies' : [{
'movieID' : movieID,
'rating' : rating
}]
}
My question is:
I want to search for a specific user that has 'userID' : 3, for example, get all is movies, then i want to get all the other users that have at least, 15 or more movies with the same 'movieID', then with that group i wanna select only the users that have those 15 movies in similarity and have one extra 'movieID' that i choose.
I already tried aggregation, but failed, and if i do single queries like getting all the users movies from a user, the cycling every user movie and comparing it takes a bunch of time.
Any ideias?
Thank you
There are a couple of ways to do this using the aggregation framework
Just a simple set of data for example:
{
"_id" : ObjectId("538181738d6bd23253654690"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 2, "rating": 6 },
{ "_id": 3, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654691"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 4, "rating": 6 },
{ "_id": 2, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654692"),
"movies": [
{ "_id": 2, "rating": 5 },
{ "_id": 5, "rating": 6 },
{ "_id": 6, "rating": 7 }
]
}
Using the first "user" as an example, now you want to find if any of the other two users have at least two of the same movies.
For MongoDB 2.6 and upwards you can simply use the $setIntersection operator along with the $size operator:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document if you want to keep more than `_id`
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
}},
// Unwind the array
{ "$unwind": "$movies" },
// Build the array back with just `_id` values
{ "$group": {
"_id": "$_id",
"movies": { "$push": "$movies._id" }
}},
// Find the "set intersection" of the two arrays
{ "$project": {
"movies": {
"$size": {
"$setIntersection": [
[ 1, 2, 3 ],
"$movies"
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
This is still possible in earlier versions of MongoDB that do not have those operators, just using a few more steps:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document along with the "set" to match
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
"set": { "$cond": [ 1, [ 1, 2, 3 ], 0 ] }
}},
// Unwind both those arrays
{ "$unwind": "$movies" },
{ "$unwind": "$set" },
// Group back the count where both `_id` values are equal
{ "$group": {
"_id": "$_id",
"movies": {
"$sum": {
"$cond":[
{ "$eq": [ "$movies._id", "$set" ] },
1,
0
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
In Detail
That may be a bit to take in, so we can take a look at each stage and break those down to see what they are doing.
$match : You do not want to operate on every document in the collection so this is an opportunity to remove the items that are not possibly matches even if there still is more work to do to find the exact ones. So the obvious things are to exclude the same "user" and then only match the documents that have at least one of the same movies as was found for that "user".
The next thing that makes sense is to consider that when you want to match n entries then only documents that have a "movies" array that is larger than n-1 can possibly actually contain matches. The use of $and here looks funny and is not required specifically, but if the required matches were 4 then that actual part of the statement would look like this:
"$and": [
{ "movies": { "$not": { "$size": 1 } } },
{ "movies": { "$not": { "$size": 2 } } },
{ "movies": { "$not": { "$size": 3 } } }
]
So you basically "rule out" arrays that are not possibly long enough to have n matches. Noting here that this $size operator in the query form is different to $size for the aggregation framework. There is no way for example to use this with an inequality operator such as $gt is it's purpose is to specifically match the requested "size". Hence this query form to specify all of the possible sizes that are less than.
$project : There are a few purposes in this statement, of which some differ depending on the MongoDB version you have. Firstly, and optionally, a document copy is being kept under the _id value so that these fields are not modified by the rest of the steps. The other part here is keeping the "movies" array at the top of the document as a copy for the next stage.
What is also happening in the version presented for pre 2.6 versions is there is an additional array representing the _id values for the "movies" to match. The usage of the $cond operator here is just a way of creating a "literal" representation of the array. Funny enough, MongoDB 2.6 introduces an operator known as $literal to do exactly this without the funny way we are using $cond right here.
$unwind : To do anything further the movies array needs to be unwound as in either case it is the only way to isolate the existing _id values for the entries that need to be matched against the "set". So for the pre 2.6 version you need to "unwind" both of the arrays that are present.
$group : For MongoDB 2.6 and greater you are just grouping back to an array that only contains the _id values of the movies with the "ratings" removed.
Pre 2.6 since all values are presented "side by side" ( and with lots of duplication ) you are doing a comparison of the two values to see if they are the same. Where that is true, this tells the $cond operator statement to return a value of 1 or 0 where the condition is false. This is directly passed back through $sum to total up the number of matching elements in the array to the required "set".
$project: Where this is the different part for MongoDB 2.6 and greater is that since you have pushed back an array of the "movies" _id values you are then using $setIntersection to directly compare those arrays. As the result of this is an array containing the elements that are the same, this is then wrapped in a $size operator in order to determine how many elements were returned in that matching set.
$match: Is the final stage that has been implemented here which does the clear step of matching only those documents whose count of intersecting elements was greater than or equal to the required number.
Final
That is basically how you do it. Prior to 2.6 is a bit clunkier and will require a bit more memory due to the expansion that is done by duplicating each array member that is found by all of the possible values of the set, but it still is a valid way to do this.
All you need to do is apply this with the greater n matching values to meet your conditions, and of course make sure your original user match has the required n possibilities. Otherwise just generate this on n-1 from the length of the "user's" array of "movies".