Aggregate on populated value with mongoose - mongodb

I will really appreicate your help with the following scenario .
I have this schema:
var Song = Schema({
author: { type: Schema.Types.ObjectId, ref: 'user' },
title: String,
photo: String,
date: Date,
duration: Number,
views: [{ type: Schema.Types.ObjectId, ref: 'user' }],
likes: [{ type: Schema.Types.ObjectId, ref: 'user' }]
})
var User = mongoose.Schema({
email:String,
name:String,
gender: String,
birthday: String,
city: String,
continent: String
});
I want to write a query that will present the user with the total likes and views from his songs group by each continent. For example:
{continent: 'Asia', views:4000, likes:5000},
{continent: 'Europe', views:3200, likes:4500}
Also I will be happy to know if its considered a "heavy query", and maybe if it is a smarter idea to save the like and view as a combination of the userId with the continent.

Rather than .populate() ( which is a "client" side operation ) you want the data to "join" on the server, where .aggregate() is a "server" side operation. This is what the $lookup operator is for:
It's probably most optimal to $map a "type" and $concatArrays first before doing the $lookup
Song.aggregate([
{ "$project": {
"author": "$author",
"data": {
"$concatArrays": [
{ "$map": {
"input": "$views",
"as": "el",
"in": { "type": "views", "_id": "$$el" }
}},
{ "$map": {
"input": "$likes",
"as": "el",
"in": { "type": "likes", "_id": "$$el" }
}}
]
}
}},
{ "$unwind": "$data" },
{ "$lookup": {
"from": "users",
"localField": "data._id",
"foreignField": "_id",
"as": "data._id"
}},
{ "$unwind": "$data._id" },
{ "$group": {
"_id": {
"author": "$author",
"continent": "$data._id.continent"
},
"views": {
"$sum": { "$cond": [ { "$eq": [ "$data.type", "views" ] }, 1, 0 ] }
},
"likes": {
"$sum": { "$cond": [ { "$eq": [ "$data.type", "likes" ] }, 1, 0 ] }
}
}}
], function(err, results) {
})
You do that "array joining" at the start because at some point you want the "likes" and "views" in a single array since if we tried to deal with them individually with later $unwind operations ( and you need to in order to "count" on the value of "continent" ) then you end up with a "cartesian product" since the contents of one array would be multiplied by contents present in the other.
So whist "joining" we mark with a "type" since they are no longer in separate fields, and we still need to distinquish between "likes" and "views" for counting.
The $lookup operation, is capable of working with a "flat" array in the most modern releases, but not with an "array of documents" as is constructed from the first stage. Simply $unwind to process first.
Once the $lookup is done the result in this form will be a single element array for every result contained at the "data._id" path. In order to continue processing we $unwind again.
Finally you $group, where a "compound key" is used for both the "author" and the "continent" values obtained from the joined data. In order to count, each of "likes" and "views" are applied to a $cond expression, which is a ternary ( if/then/else ) operator. Given a condition in the first argument, where that condition is true then the second argument value is returned or when false the third argument.
The results of those expressions are passed to $sum to accumulate, thus when the conditions are matching a positive count is returned and accumulates for the grouping key.
All aggregations are "heavy" operations, and performing "joins" is really considered even more "heavy".
In a large number of cases there is nothing wrong with your application performing this type of query at run time. It really comes down to if this runs at an effective speed on your data or not. If data is considerably large enough that such operations take excessive time, then you should be "pre-aggregating" by accumulating such summary data in separate records. In this case, incrementing a "like" or "view" count per author per continent or the like.

Related

How to find documents according to a common field value from another collection in mongodb

Assume I have 2 collections:
student:
{name: Joe, school: A}
{name: Kelly, school: B}
{name: Mike, school: C}
{name: Tom, school: D}
schoolRank: (all the school rank is stored in one document)
{rank: [{school: A, value: 1},{school: B, value: 2},{school: C, value: 3},{school: D, value: 4}]}
Now, my question is how could I find the student whoes school rank is higher than 3. (I am a newbie to mongodb. It seems like I need to use lookup but I am not sure how to do it exactly.) Thank you in advance!
You need to use $lookup. Is like a "join" in SQL.
But, first of all. Your document could be much better. schoolRank collection could have every school in a document instead of a unique array wit all values.
Check here the difference between the query with your schema and the schema with schoolRank splited into diffretend documents.
The second query return only the document where field school match. The other will return the entire array for each document, because in each document exist a field school that also exists into rank array.
So, with your schema you need extra stages. Maybe there is another way more efficent, but I'm not used to do $lookup with a bad schema (sorry).
I've try this query:
First $lookup to join both collections (as I've said before, the join is basically add the entire array into each document).
Then an extra stage to get the value returned from $lookup using $set with the element at first position.
After that, using $project te query can filter the field rank_school and overwrite it to get only the element which field school is the same as student.school.
Note that the above steps could be omitted using another schema.
Then, after the $project there is a $match stage to get the documents whose rank_school.value is greater or equal than 3.
And the last stage is another $project to remove the field rank_school.
This is the query:
db.student.aggregate([
{
"$lookup": {
"from": "schoolRank",
"localField": "school",
"foreignField": "rank.school",
"as": "rank_school"
}
},
{
"$set": { "rank_school": { "$arrayElemAt": [ "$rank_school", 0 ] } }
},
{
"$project": {
"_id": "$_id",
"name": "$name",
"school": "$school",
"rank_school": {
"$filter": {
"input": "$rank_school.rank",
"as": "rank_school_filter",
"cond": { "$eq": [ "$$rank_school_filter.school", "$school" ] }
}
}
}
},
{
"$match": { "rank_school.value": { "$gte": 3 } }
},
{
"$project": { "rank_school": 0 }
}
])
Example here.
And the output is:
[
{
"_id": ObjectId("5a934e000102030405000003"),
"name": "Mike",
"school": "C"
},
{
"_id": ObjectId("5a934e000102030405000004"),
"name": "Tom",
"school": "D"
}
]

How to perform operations in pipeline in mongo $lookup [duplicate]

I have the following collections:
venue collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf86"),
"name" : "ASA College - Manhattan Campus",
"addedBy" : ObjectId("5ac8ba3582c2345af70d4658"),
"reviews" : [
ObjectId("5acdb8f65ea63a27c1facf8b"),
ObjectId("5ad8288ccdd9241781dce698")
]
}
reviews collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"createdAt" : ISODate("2018-04-07T12:31:49.503Z"),
"venue" : ObjectId("5acdb8f65ea63a27c1facf86"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"content" : "nice place",
"comments" : [
ObjectId("5ad87113882d445c5cbc92c8")
]
}
comment collection
{ "_id" : ObjectId("5ad87113882d445c5cbc92c8"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"comment" : "dcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsf",
"review" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"__v" : 0
}
author collection
{ "_id" : ObjectId("5ac8ba3582c2345af70d4658"),
"firstName" : "Bruce",
"lastName" : "Wayne",
"email" : "bruce#linkites.com",
"followers" : [ObjectId("5ac8b91482c2345af70d4650")]
}
Now the following populate query works fine
const venues = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'author' },
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
However, I want to achieve it with $lookup query, but it splits the venue when I am doing '$unwind' to the reviews... I want reviews in same array (like populate) and in same order...
I want to achieve following query with $lookup because author have followers field so I need to send field isFollow by doing $project which cannot be done using populate...
$project: {
isFollow: { $in: [mongoose.Types.ObjectId(req.user.id), '$followers'] }
}
There are a couple of approaches of course depending on your available MongoDB version. These vary from different usages of $lookup through to enabling object manipulation on the .populate() result via .lean().
I do ask that you read the sections carefully, and be aware that all may not be as it seems when considering your implementation solution.
MongoDB 3.6, "nested" $lookup
With MongoDB 3.6 the $lookup operator gets the additional ability to include a pipeline expression as opposed to simply joining a "local" to "foreign" key value, what this means is you can essentially do each $lookup as "nested" within these pipeline expressions
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"let": { "reviews": "$reviews" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$reviews" ] } } },
{ "$lookup": {
"from": Comment.collection.name,
"let": { "comments": "$comments" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$comments" ] } } },
{ "$lookup": {
"from": Author.collection.name,
"let": { "author": "$author" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$_id", "$$author" ] } } },
{ "$addFields": {
"isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$followers"
]
}
}}
],
"as": "author"
}},
{ "$addFields": {
"author": { "$arrayElemAt": [ "$author", 0 ] }
}}
],
"as": "comments"
}},
{ "$sort": { "createdAt": -1 } }
],
"as": "reviews"
}},
])
This can be really quite powerful, as you see from the perspective of the original pipeline, it really only knows about adding content to the "reviews" array and then each subsequent "nested" pipeline expression also only ever sees it's "inner" elements from the join.
It is powerful and in some respects it may be a bit clearer as all field paths are relative to the nesting level, but it does start that indentation creep in the BSON structure, and you do need to be aware of whether you are matching to arrays or singular values in traversing the structure.
Note we can also do things here like "flattening the author property" as seen within the "comments" array entries. All $lookup target output may be an "array", but within a "sub-pipeline" we can re-shape that single element array into just a single value.
Standard MongoDB $lookup
Still keeping the "join on the server" you can actually do it with $lookup, but it just takes intermediate processing. This is the long standing approach with deconstructing an array with $unwind and the using $group stages to rebuild arrays:
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"localField": "reviews",
"foreignField": "_id",
"as": "reviews"
}},
{ "$unwind": "$reviews" },
{ "$lookup": {
"from": Comment.collection.name,
"localField": "reviews.comments",
"foreignField": "_id",
"as": "reviews.comments",
}},
{ "$unwind": "$reviews.comments" },
{ "$lookup": {
"from": Author.collection.name,
"localField": "reviews.comments.author",
"foreignField": "_id",
"as": "reviews.comments.author"
}},
{ "$unwind": "$reviews.comments.author" },
{ "$addFields": {
"reviews.comments.author.isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$reviews.comments.author.followers"
]
}
}},
{ "$group": {
"_id": {
"_id": "$_id",
"reviewId": "$review._id"
},
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"review": {
"$first": {
"_id": "$review._id",
"createdAt": "$review.createdAt",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content"
}
},
"comments": { "$push": "$reviews.comments" }
}},
{ "$sort": { "_id._id": 1, "review.createdAt": -1 } },
{ "$group": {
"_id": "$_id._id",
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"reviews": {
"$push": {
"_id": "$review._id",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content",
"comments": "$comments"
}
}
}}
])
This really is not as daunting as you might think at first and follows a simple pattern of $lookup and $unwind as you progress through each array.
The "author" detail of course is singular, so once that is "unwound" you simply want to leave it that way, make the field addition and start the process of "rolling back" into the arrays.
There are only two levels to reconstruct back to the original Venue document, so the first detail level is by Review to rebuild the "comments" array. All you need to is to $push the path of "$reviews.comments" in order to collect these, and as long as the "$reviews._id" field is in the "grouping _id" the only other things you need to keep are all the other fields. You can put all of these into the _id as well, or you can use $first.
With that done there is only one more $group stage in order to get back to Venue itself. This time the grouping key is "$_id" of course, with all properties of the venue itself using $first and the remaining "$review" details going back into an array with $push. Of course the "$comments" output from the previous $group becomes the "review.comments" path.
Working on a single document and it's relations, this is not really so bad. The $unwind pipeline operator can generally be a performance issue, but in the context of this usage it should not really cause that much of an impact.
Since the data is still being "joined on the server" there is still far less traffic than the other remaining alternative.
JavaScript Manipulation
Of course the other case here is that instead of changing data on the server itself, you actually manipulate the result. In most cases I would be in favor of this approach since any "additions" to the data are probably best handled on the client.
The problem of course with using populate() is that whilst it may 'look like' a much more simplified process, it is in fact NOT A JOIN in any way. All populate() actually does is "hide" the underlying process of submitting multiple queries to the database, and then awaiting the results through async handling.
So the "appearance" of a join is actually the result of multiple requests to the server and then doing "client side manipulation" of the data to embed the details within arrays.
So aside from that clear warning that the performance characteristics are nowhere close to being on par with a server $lookup, the other caveat is of course that the "mongoose Documents" in the result are not actually plain JavaScript objects subject to further manipulation.
So in order to take this approach, you need to add the .lean() method to the query before execution, in order to instruct mongoose to return "plain JavaScript objects" instead of Document types which are cast with schema methods attached to the model. Noting of course that the resulting data no longer has access to any "instance methods" that would otherwise be associated with the related models themselves:
let venue = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
.lean();
Now venue is a plain object, we can simply process and adjust as needed:
venue.reviews = venue.reviews.map( r =>
({
...r,
comments: r.comments.map( c =>
({
...c,
author: {
...c.author,
isAuthor: c.author.followers.map( f => f.toString() ).indexOf(req.user.id) != -1
}
})
)
})
);
So it's really just a matter of cycling through each of the inner arrays down until the level where you can see the followers array within the author details. The comparison then can be made against the ObjectId values stored in that array after first using .map() to return the "string" values for comparison against the req.user.id which is also a string (if it is not, then also add .toString() on that ), since it is easier in general to compare these values in this way via JavaScript code.
Again though I need to stress that it "looks simple" but it is in fact the sort of thing you really want to avoid for system performance, as those additional queries and the transfer between the server and the client cost a lot in time of processing and even due to the request overhead this adds up to real costs in transport between hosting providers.
Summary
Those are basically your approaches you can take, short of "rolling your own" where you actually perform the "multiple queries" to the database yourself instead of using the helper that .populate() is.
Using the populate output, you can then simply manipulate the data in result just like any other data structure, as long as you apply .lean() to the query to convert or otherwise extract the plain object data from the mongoose documents returned.
Whilst the aggregate approaches look far more involved, there are "a lot" more advantages to doing this work on the server. Larger result sets can be sorted, calculations can be done for further filtering, and of course you get a "single response" to a "single request" made to the server, all with no additional overhead.
It is totally arguable that the pipelines themselves could simply be constructed based on attributes already stored on the schema. So writing your own method to perform this "construction" based on the attached schema should not be too difficult.
In the longer term of course $lookup is the better solution, but you'll probably need to put a little more work into the initial coding, if of course you don't just simply copy from what is listed here ;)

MongoDB lookup 3 collections and nesting the collections [duplicate]

I have the following collections:
venue collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf86"),
"name" : "ASA College - Manhattan Campus",
"addedBy" : ObjectId("5ac8ba3582c2345af70d4658"),
"reviews" : [
ObjectId("5acdb8f65ea63a27c1facf8b"),
ObjectId("5ad8288ccdd9241781dce698")
]
}
reviews collection
{ "_id" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"createdAt" : ISODate("2018-04-07T12:31:49.503Z"),
"venue" : ObjectId("5acdb8f65ea63a27c1facf86"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"content" : "nice place",
"comments" : [
ObjectId("5ad87113882d445c5cbc92c8")
]
}
comment collection
{ "_id" : ObjectId("5ad87113882d445c5cbc92c8"),
"author" : ObjectId("5ac8ba3582c2345af70d4658"),
"comment" : "dcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsf",
"review" : ObjectId("5acdb8f65ea63a27c1facf8b"),
"__v" : 0
}
author collection
{ "_id" : ObjectId("5ac8ba3582c2345af70d4658"),
"firstName" : "Bruce",
"lastName" : "Wayne",
"email" : "bruce#linkites.com",
"followers" : [ObjectId("5ac8b91482c2345af70d4650")]
}
Now the following populate query works fine
const venues = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'author' },
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
However, I want to achieve it with $lookup query, but it splits the venue when I am doing '$unwind' to the reviews... I want reviews in same array (like populate) and in same order...
I want to achieve following query with $lookup because author have followers field so I need to send field isFollow by doing $project which cannot be done using populate...
$project: {
isFollow: { $in: [mongoose.Types.ObjectId(req.user.id), '$followers'] }
}
There are a couple of approaches of course depending on your available MongoDB version. These vary from different usages of $lookup through to enabling object manipulation on the .populate() result via .lean().
I do ask that you read the sections carefully, and be aware that all may not be as it seems when considering your implementation solution.
MongoDB 3.6, "nested" $lookup
With MongoDB 3.6 the $lookup operator gets the additional ability to include a pipeline expression as opposed to simply joining a "local" to "foreign" key value, what this means is you can essentially do each $lookup as "nested" within these pipeline expressions
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"let": { "reviews": "$reviews" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$reviews" ] } } },
{ "$lookup": {
"from": Comment.collection.name,
"let": { "comments": "$comments" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$comments" ] } } },
{ "$lookup": {
"from": Author.collection.name,
"let": { "author": "$author" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$_id", "$$author" ] } } },
{ "$addFields": {
"isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$followers"
]
}
}}
],
"as": "author"
}},
{ "$addFields": {
"author": { "$arrayElemAt": [ "$author", 0 ] }
}}
],
"as": "comments"
}},
{ "$sort": { "createdAt": -1 } }
],
"as": "reviews"
}},
])
This can be really quite powerful, as you see from the perspective of the original pipeline, it really only knows about adding content to the "reviews" array and then each subsequent "nested" pipeline expression also only ever sees it's "inner" elements from the join.
It is powerful and in some respects it may be a bit clearer as all field paths are relative to the nesting level, but it does start that indentation creep in the BSON structure, and you do need to be aware of whether you are matching to arrays or singular values in traversing the structure.
Note we can also do things here like "flattening the author property" as seen within the "comments" array entries. All $lookup target output may be an "array", but within a "sub-pipeline" we can re-shape that single element array into just a single value.
Standard MongoDB $lookup
Still keeping the "join on the server" you can actually do it with $lookup, but it just takes intermediate processing. This is the long standing approach with deconstructing an array with $unwind and the using $group stages to rebuild arrays:
Venue.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
{ "$lookup": {
"from": Review.collection.name,
"localField": "reviews",
"foreignField": "_id",
"as": "reviews"
}},
{ "$unwind": "$reviews" },
{ "$lookup": {
"from": Comment.collection.name,
"localField": "reviews.comments",
"foreignField": "_id",
"as": "reviews.comments",
}},
{ "$unwind": "$reviews.comments" },
{ "$lookup": {
"from": Author.collection.name,
"localField": "reviews.comments.author",
"foreignField": "_id",
"as": "reviews.comments.author"
}},
{ "$unwind": "$reviews.comments.author" },
{ "$addFields": {
"reviews.comments.author.isFollower": {
"$in": [
mongoose.Types.ObjectId(req.user.id),
"$reviews.comments.author.followers"
]
}
}},
{ "$group": {
"_id": {
"_id": "$_id",
"reviewId": "$review._id"
},
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"review": {
"$first": {
"_id": "$review._id",
"createdAt": "$review.createdAt",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content"
}
},
"comments": { "$push": "$reviews.comments" }
}},
{ "$sort": { "_id._id": 1, "review.createdAt": -1 } },
{ "$group": {
"_id": "$_id._id",
"name": { "$first": "$name" },
"addedBy": { "$first": "$addedBy" },
"reviews": {
"$push": {
"_id": "$review._id",
"venue": "$review.venue",
"author": "$review.author",
"content": "$review.content",
"comments": "$comments"
}
}
}}
])
This really is not as daunting as you might think at first and follows a simple pattern of $lookup and $unwind as you progress through each array.
The "author" detail of course is singular, so once that is "unwound" you simply want to leave it that way, make the field addition and start the process of "rolling back" into the arrays.
There are only two levels to reconstruct back to the original Venue document, so the first detail level is by Review to rebuild the "comments" array. All you need to is to $push the path of "$reviews.comments" in order to collect these, and as long as the "$reviews._id" field is in the "grouping _id" the only other things you need to keep are all the other fields. You can put all of these into the _id as well, or you can use $first.
With that done there is only one more $group stage in order to get back to Venue itself. This time the grouping key is "$_id" of course, with all properties of the venue itself using $first and the remaining "$review" details going back into an array with $push. Of course the "$comments" output from the previous $group becomes the "review.comments" path.
Working on a single document and it's relations, this is not really so bad. The $unwind pipeline operator can generally be a performance issue, but in the context of this usage it should not really cause that much of an impact.
Since the data is still being "joined on the server" there is still far less traffic than the other remaining alternative.
JavaScript Manipulation
Of course the other case here is that instead of changing data on the server itself, you actually manipulate the result. In most cases I would be in favor of this approach since any "additions" to the data are probably best handled on the client.
The problem of course with using populate() is that whilst it may 'look like' a much more simplified process, it is in fact NOT A JOIN in any way. All populate() actually does is "hide" the underlying process of submitting multiple queries to the database, and then awaiting the results through async handling.
So the "appearance" of a join is actually the result of multiple requests to the server and then doing "client side manipulation" of the data to embed the details within arrays.
So aside from that clear warning that the performance characteristics are nowhere close to being on par with a server $lookup, the other caveat is of course that the "mongoose Documents" in the result are not actually plain JavaScript objects subject to further manipulation.
So in order to take this approach, you need to add the .lean() method to the query before execution, in order to instruct mongoose to return "plain JavaScript objects" instead of Document types which are cast with schema methods attached to the model. Noting of course that the resulting data no longer has access to any "instance methods" that would otherwise be associated with the related models themselves:
let venue = await Venue.findOne({ _id: id.id })
.populate({
path: 'reviews',
options: { sort: { createdAt: -1 } },
populate: [
{ path: 'comments', populate: [{ path: 'author' }] }
]
})
.lean();
Now venue is a plain object, we can simply process and adjust as needed:
venue.reviews = venue.reviews.map( r =>
({
...r,
comments: r.comments.map( c =>
({
...c,
author: {
...c.author,
isAuthor: c.author.followers.map( f => f.toString() ).indexOf(req.user.id) != -1
}
})
)
})
);
So it's really just a matter of cycling through each of the inner arrays down until the level where you can see the followers array within the author details. The comparison then can be made against the ObjectId values stored in that array after first using .map() to return the "string" values for comparison against the req.user.id which is also a string (if it is not, then also add .toString() on that ), since it is easier in general to compare these values in this way via JavaScript code.
Again though I need to stress that it "looks simple" but it is in fact the sort of thing you really want to avoid for system performance, as those additional queries and the transfer between the server and the client cost a lot in time of processing and even due to the request overhead this adds up to real costs in transport between hosting providers.
Summary
Those are basically your approaches you can take, short of "rolling your own" where you actually perform the "multiple queries" to the database yourself instead of using the helper that .populate() is.
Using the populate output, you can then simply manipulate the data in result just like any other data structure, as long as you apply .lean() to the query to convert or otherwise extract the plain object data from the mongoose documents returned.
Whilst the aggregate approaches look far more involved, there are "a lot" more advantages to doing this work on the server. Larger result sets can be sorted, calculations can be done for further filtering, and of course you get a "single response" to a "single request" made to the server, all with no additional overhead.
It is totally arguable that the pipelines themselves could simply be constructed based on attributes already stored on the schema. So writing your own method to perform this "construction" based on the attached schema should not be too difficult.
In the longer term of course $lookup is the better solution, but you'll probably need to put a little more work into the initial coding, if of course you don't just simply copy from what is listed here ;)

Inner Join on two Fields

I have the following schemas
var User = mongoose.Schema({
email:{type: String, trim: true, index: true, unique: true, sparse: true},
password: String,
name:{type: String, trim: true, index: true, unique: true, sparse: true},
gender: String,
});
var Song = Schema({
track: { type: Schema.Types.ObjectId, ref: 'track' },//Track can be deleted
author: { type: Schema.Types.ObjectId, ref: 'user' },
url: String,
title: String,
photo: String,
publishDate: Date,
views: [{ type: Schema.Types.ObjectId, ref: 'user' }],
likes: [{ type: Schema.Types.ObjectId, ref: 'user' }],
collaborators: [{ type: Schema.Types.ObjectId, ref: 'user' }],
});
I want to select all users (without the password value) , but I want each user will have all the songs where he is the author or one of the collaborators and the was published in the last 2 weeks.
What is the best strategy perform this action (binding between the user.id and song .collaborators) ? Can it be done in one select?
It's very possible in one request, and the basic tool for this with MongoDB is $lookup.
I would think this actually makes more sense to query from the Song collection instead, since your criteria is that they must be listed in one of two properties on that collection.
Optimal INNER Join - Reversed
Presuming the actual "model" names are what is listed above:
var today = new Date.now(),
oneDay = 1000 * 60 * 60 * 24,
twoWeeksAgo = new Date(today - ( oneDay * 14 ));
var userIds; // Should be assigned as an 'Array`, even if only one
Song.aggregate([
{ "$match": {
"$or": [
{ "author": { "$in": userIds } },
{ "collaborators": { "$in": userIds } }
],
"publishedDate": { "$gt": twoWeeksAgo }
}},
{ "$addFields": {
"users": {
"$setIntersection": [
userIds,
{ "$setUnion": [ ["$author"], "$collaborators" ] }
]
}
}},
{ "$lookup": {
"from": User.collection.name,
"localField": "users",
"foreignField": "_id",
"as": "users"
}},
{ "$unwind": "$users" },
{ "$group": {
"_id": "$users._id",
"email": { "$first": "$users.email" },
"name": { "$first": "$users.name" },
"gender": { "$first": "$users.gender" },
"songs": {
"$push": {
"_id": "$_id",
"track": "$track",
"author": "$author",
"url": "$url",
"title": "$title",
"photo": "$photo",
"publishedDate": "$publishedDate",
"views": "$views",
"likes": "$likes",
"collaborators": "$collaborators"
}
}
}}
])
That to me is the most logical course as long as it's an "INNER JOIN" you want from the results, meaning that "all users MUST have a mention on at least one song" in the two properties involved.
The $setUnion takes the "unique list" ( ObjectId is unique anyway ) of combining those two. So if an "author" is also a "collaborator" then they are only listed once for that song.
The $setIntersection "filters" the list from that combined list to only those that were specified in the query condition. This removes any other "collaborator" entries that would not have been in the selection.
The $lookup does the "join" on that combined data to get the users, and the $unwind is done because you want the User to be the main detail. So we basically reverse the "array of users" into "array of songs" in the result.
Also, since the main criteria is from Song, then it makes sense to query from that collection as the direction.
Optional LEFT Join
Doing this the other way around is where the "LEFT JOIN" is wanted, being "ALL Users" regardless if there are any associated songs or not:
User.aggregate([
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "author",
"as": "authors"
}},
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "collaborators",
"as": "collaborators"
}},
{ "$project": {
"email": 1,
"name": 1,
"gender": 1,
"songs": { "$setUnion": [ "$authors", "$collaborators" ] }
}}
])
So the listing of the statement "looks" shorter, but it is forcing "two" $lookup stages in order to obtain results for possible "authors" and "collaborators" rather than one. So the actual "join" operations can be costly in execution time.
The rest is pretty straightforward in applying the same $setUnion but this time the the "result arrays" rather than the original source of the data.
If you wanted similar "query" conditions to above on the "filter" for the "songs" and not the actual User documents returned, then for LEFT Join you actually $filter the array content "post" $lookup:
User.aggregate([
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "author",
"as": "authors"
}},
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "collaborators",
"as": "collaborators"
}},
{ "$project": {
"email": 1,
"name": 1,
"gender": 1,
"songs": {
"$filter": {
"input": { "$setUnion": [ "$authors", "$collaborators" ] },
"as": "s",
"cond": {
"$and": [
{ "$setIsSubset": [
userIds
{ "$setUnion": [ ["$$s.author"], "$$s.collaborators" ] }
]},
{ "$gte": [ "$$s.publishedDate", oneWeekAgo ] }
]
}
}
}
}}
])
Which would mean that by LEFT JOIN Conditions, ALL User documents are returned but the only ones which will contain any "songs" will be those that met the "filter" conditions as being part of the supplied userIds. And even those users which were contained in the list will only show those "songs" within the required range for publishedDate.
The main addition within the $filter is the $setIsSubset operator, which is a short way of comparing the supplied list in userIds to the "combined" list from the two fields present in the document. Noting here the the "current user" already had to be "related" due to the earlier conditions of each $lookup.
MongoDB 3.6 Preview
A new "sub-pipeline" syntax available for $lookup from the MongoDB 3.6 release means that rather than "two" $lookup stages as shown for the LEFT Join variant, you can in fact structure this as a "sub-pipeline", which also optimally filters content before returning results:
User.aggregate([
{ "$lookup": {
"from": Song.collection.name,
"let": {
"user": "$_id"
},
"pipeline": [
{ "$match": {
"$or": [
{ "author": { "$in": userIds } },
{ "collaborators": { "$in": userIds } }
],
"publishedDate": { "$gt": twoWeeksAgo },
"$expr": {
"$or": [
{ "$eq": [ "$$user", "$author" ] },
{ "$setIsSubset": [ ["$$user"], "$collaborators" ]
]
}
}}
],
"as": "songs"
}}
])
And that is all there is to it in that case, since $expr allows usage of the $$user variable declared in "let" to be compared with each entry in the song collection to select only those that are matching in addition to the other query criteria. The result being only those matching songs per user or an empty array. Thus making the whole "sub-pipeline" simply a $match expression, which is pretty much the same as additional logic as opposed to fixed local and foreign keys.
So you could even add a stage to the pipeline following $lookup to filter out any "empty" array results, making the overall result an INNER Join.
So personally I would go for the first approach when you can and only use the second approach where you need to.
NOTE: There are a couple of options here that don't really apply as well. The first being a special case of $lookup + $unwind + $match coalescence in which whilst the basic case applies to the initial INNER Join example it cannot be applied with the LEFT Join Case.
This is because in order for a LEFT Join to be obtained, the usage of $unwind must be implemented with preserveNullAndEmptyArrays: true, and this breaks the rule of application in that the unwinding and matching cannot be "rolled up" within the $lookup and applied to the foreign collection "before" returning results.
Hence why it is not applied in the sample and we use $filter on the returned array instead, since there is no optimal action that can be applied to the foreign collection "before" the results are returned, and nothing stopping all results for songs matching on simply the foreign key from returning. INNER Joins are of course different.
The other case is .populate() with mongoose. The most important distinction being that .populate() is not a single request, but just a programming "shorthand" for actually issuing multiple queries. So at any rate, there would actually be multiple queries issued and always requiring ALL results in order to apply any filtering.
Which leads to the limitation on where the filtering is actually applied, and generally means that you cannot really implement "paging" concepts when you utilize "client side joins" that require conditions to be applied on the foreign collection.
There are some more details on this on Querying after populate in Mongoose, and an actual demonstration of how the basic functionality can be wired in as a custom method in mongoose schema's anyway, but actually using the $lookup pipeline processing underneath.

Return only matched sub-document elements within a nested array

The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $ operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores" array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers", and as such every "offer" within the matchd "stores" array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter operation:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.x and above with $map and $setDifference:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
MongoDB 3.2.x and greater
So generally speaking, $filter is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers" within "stores" to examime and $filter that content.
The simple comparison here is "Does the "size" array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset operation to compare an array ("set") of ["L"] to the target array. Where that condition is true ( it contains "L" ) then the array element for "offers" is retained and returned in the result.
In the higher level $filter, you are then looking to see if the result from that previous $filter returned an empty array [] for "offers". If it is not empty, then the element is returned or otherwise it is removed.
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter in this version you can use $map to inspect each element and then use $setDifference to filter out any elements that were returned as false.
So $map is going to return the whole array, but the $cond operation just decides whether to return the element or instead a false value. In the comparison of $setDifference to a single element "set" of [false] all false elements in the returned array would be removed.
In all other ways, the logic is the same as above.
MongoDB 2.2.x and up
So below MongoDB 2.6 the only tool for working with arrays is $unwind, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
Conclusion
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find().
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match and $project, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints.
Have a fun!
It's also works without aggregate.
here is the solution link:https://mongoplayground.net/p/Q5lxPvGK03A
db.collection.find({
"stores.offers.size": "L"
},
{
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [
[
"L"
],
"$$offer.size"
]
}
}
}
}
}
},
"as": "store",
"cond": {
"$ne": [
"$$store.offers",
[]
]
}
}
}
})