MongoDB: Populate reference in $group when aggregating? - mongodb

I have a collection that I need to group by year. My aggregation pipeline is as such:
const WorkHistoryByYear = await this.workHistroyModel.aggregate([
{
$group: {
_id: '$year',
history: {
$push: '$$ROOT'
}
}
}
]);
Which works as expected, returning:
[{
"_id": 2003,
"history": [
{
"_id": "600331b3d84ac418877a0e5a",
"tasksPerformed": [
"5fffb180a477c4f78ad67331",
"5fffb18aa477c4f78ad67332"
],
"year": 2003
},
{
"_id": "600331dcd84ac418877a0e5c",
"tasksPerformed": [
"5fffb180a477c4f78ad67331"
],
"year": 2003
}
]
}]
but I'd like to populate a field if possible.
The WorkHistory schema has a field, tasksPerformed which is an array of ObjectId references. Here is the Task schema:
export const TaskSchema = new Schema({
active: {
type: Schema.Types.Boolean,
default: true,
},
title: {
type: Schema.Types.String,
required: true,
},
order: {
type: Schema.Types.Number,
index: true,
}
});
Is it possible to populate the referenced models within the aggregation? $lookup seems to be what I need, but I have yet to get that to work when following the documentation.
I don't do a lot of database work, so I'm having some difficulty finding the right operator(s) to use, and I've seen similar questions, but not a definitive answer.
Edit:
After adding the code from #varman's answer, my return is now:
{
"_id": 2003,
"history": {
"_id": "600331b3d84ac418877a0e5a",
"tasksPerformed": [
"5fffb180a477c4f78ad67331",
"5fffb18aa477c4f78ad67332"
],
"year": 2003,
"history": {
"tasksPerformed": []
}
}
}
I converted the ObjectId references to strings in an effort to help the matching, but I'm still coming up short.

You can do the lookup to join both collections
$unwind to deconstruct the array. (Array to Objects)
$lookup to join collections
$group to reconstruct the deconstructed array again
The script for the above result is
db.workHistory.aggregate([
{
"$unwind": "$history"
},
{
"$lookup": {
"from": "taskSchema",
"localField": "history.tasksPerformed",
"foreignField": "_id",
"as": "history.tasksPerformed"
}
},
{
"$group": {
"_id": "$_id",
"history": {
"$push": "$history"
}
}
}
])
Working Mongo playground
But before grouping, you have collection look like this
db={
"workHistory": [
{
"_id": 2003,
"history": [
{
"_id": "600331b3d84ac418877a0e5a",
"tasksPerformed": [
"5fffb180a477c4f78ad67331",
"5fffb18aa477c4f78ad67332"
],
"year": 2003
},
{
"_id": "600331dcd84ac418877a0e5c",
"tasksPerformed": [
"5fffb180a477c4f78ad67331"
],
"year": 2003
}
]
}
],
"taskSchema": [
{
"_id": "5fffb180a477c4f78ad67331",
"active": true,
"title": "first"
},
{
"_id": "5fffb18aa477c4f78ad67332",
"active": true,
"title": "second"
}
]
}
Since $unwind is expensive, we could have done aggregation
db.workHistory.aggregate([
{
"$lookup": {
"from": "taskSchema",
"localField": "tasksPerformed",
"foreignField": "_id",
"as": "tasksPerformed"
}
},
{
$group: {
_id: "$year",
history: {
$push: "$$ROOT"
}
}
}
])
Working Mongo playground

you could also do it without an unwind:
db.WorkHistory.aggregate(
[
{
$lookup: {
from: "Tasks",
localField: "tasksPerformed",
foreignField: "_id",
as: "tasksPerformed"
}
},
{
$group: {
_id: "$year",
history: { $push: "$$ROOT" }
}
}
])

Related

MongoDB aggregate $lookup with _ID's from array

I have two collections in MongoDB: "carts" and another "products"
Carts:
[{
"_id": {
"$oid": "62af0fefebc0b42a875c7df1"
},
"uuid": "6ca05ae0-522a-4db3-b380-2d2330ee1e27",
"cartitems": [
"62a0b24680cc2891148daf7b",
"62a7339d91d01868921afa0a",
"62a72f7191d01868921afa08",
"62a7330291d01868921afa09"
],
"created": "2022-06-19T14:00:47.846958537+02:00[Europe/Brussels]",
"lastupdate": "2022-06-19T14:01:06.15165564+02:00[Europe/Brussels]"
},
{...},
{...}]
products:
[{
"_id": {
"$oid": "62a0b24680cc2891148daf7b"
},
"name": "Product1",
"created": "2022-06-11T09:41:54.461308647+02:00[Europe/Brussels]",
"category": "Workshops",
"pricein": "28900",
"lastupdate": "2022-06-17T16:09:53.385655474+02:00[Europe/Brussels]"
},
{...},
{...}]
I would like to use Aggregate:
db.carts.aggregate([
{ $match: { uuid: "6ca05ae0-522a-4db3-b380-2d2330ee1e27" } },
{
$lookup: {
from: "products",
localField: "cartitems",
foreignField: "_id",
as: "output"
}
}
])
This is not working because localField: "cartitems" should be converted:
"$addFields": {
"convertedIdStr": {
"$toString": "$cartitems"
}
But I don't manage to convert because cartitems is an array.
Any help would be great, Thanks a lot!
Use $lookup with pipeline. In $lookup pipeline stage, add $match stage by filtering the (converted to string) product's _id is in cartitems (array) variable.
db.carts.aggregate([
{
$match: {
uuid: "6ca05ae0-522a-4db3-b380-2d2330ee1e27"
}
},
{
$lookup: {
from: "products",
let: {
cartitems: "$cartitems"
},
pipeline: [
{
$match: {
$expr: {
$in: [
{
$toString: "$_id"
},
"$$cartitems"
]
}
}
}
],
as: "output"
}
}
])
Sample Mongo Playground

perform lookup on array from another collection in MongoDB

I have a collection of Orders. each order has a list of Items, and each Item has catalog_id, which is an ObjectId pointing to the Catalogs collection.
I need an aggregate query that will retrieve certain orders - each order with its Items in extended fashion including the Catalog name and SKU. i.e:
Original data structure:
Orders: [{
_id : ObjectId('ord1'),
items : [{
catalog_id: ObjectId('xyz1'),
qty: 5
},
{
catalog_id: ObjectId('xyz2'),
qty: 3
}]
Catalogs: [{
_id : ObjectId('xyz1')
name: 'my catalog name',
SKU: 'XxYxZx1'
},{
_id : ObjectId('xyz2')
name: 'my other catalog name',
SKU: 'XxYxZx2'
}
]
ideal outcome would be:
Orders: [{
_id : ObjectId('ord1'),
items : [{
catalog_id: ObjectId('xyz1'),
catalog_name: 'my catalog name',
catalog_SKU: 'XxYxZx1' ,
qty: 5
},
{
catalog_id: ObjectId('xyz2'),
catalog_name: 'my other catalog name',
catalog_SKU: 'XxYxZx2' ,
qty: 3
}
]
What I did so far was:
db.orders.aggregate(
[
{
$match: {merchant_order_id: 'NIM333'}
},
{
$lookup: {
from: "catalogs",
//localField: 'items.catalog_id',
//foreignField: '_id',
let: { 'catalogId' : 'items.catalog_id' },
pipeline: [
{
$match : {$expr:{$eq:["$catalogs._id", "$$catalogId"]}}
},
{
$project: {"name": 1, "merchant_SKU": 1 }
}
],
as: "items_ex"
},
},
])
but items_ex comes out empty for some reason i cannot understand.
You need to first $unwind the items and reconstruct the array back using $group to match the exact position of qty with the catalogs_id inside the items array
db.orders.aggregate([
{ "$match": { "merchant_order_id": "NIM333" }},
{ "$unwind": "$items" },
{ "$lookup": {
"from": "catalogs",
"let": { "catalogId": "$items.catalog_id", "qty": "$items.qty" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$_id", "$$catalogId"] } }},
{ "$project": { "name": 1, "merchant_SKU": 1, "qty": "$$qty" }}
],
"as": "items"
}},
{ "$unwind": "$items" },
{ "$group": {
"_id": "$_id",
"items": { "$push": "$items" },
"data": { "$first": "$$ROOT" }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": ["$data", { "items": "$items" }]
}
}}
])
MongoPlayground
You're missing a dollar sign when you define your pipeline variable. There should be:
let: { 'catalogId' : '$items.catalog_id' },
and also this expression returns an array to you need $in instead of $eq:
{
$lookup: {
from: "catalogs",
let: { 'catalogId' : 'items.catalog_id' },
pipeline: [
{
$match : {$expr:{$in:["$_id", "$$catalogId"]}}
},
{
$project: {"name": 1, "merchant_SKU": 1 }
}
],
as: "items_ex"
}
}
Mongo Playground

mongodb aggregation pipeline not returning proper result and slow

I have three collections users, products and orders , orders type has two possible values "Cash" and "Online". One users can have single/multiple products and products have none/single/multiple orders. I want to text search on users collection on name. Now I want to write a query which will return all matching users on text search highest text score first, it might be possible one user's name is returning top score but don't have any products and orders.
I have written a query but it's not returning users who has text score highest but don't have any products/orders. It's only returning users who has record present in all three collections. And also performance of this query is not great taking long time if a user has lot of products for example more than 3000 products. Any help appreciated.
db.users.aggregate(
[
{
"$match": {
"$text": {
"$search": "john"
}
}
},
{
"$addFields": {
"score": {
"$meta": "textScore"
}
}
},
{
"$sort": {
"Score": {
"$meta": "textScore"
}
}
},
{
"$skip": 0
},
{
"$limit": 6
},
{
"$lookup": {
"from": "products",
"localField": "userId",
"foreignField": "userId",
"as": "products"
}
},
{ $unwind: '$products' },
{
"$lookup": {
"from": "orders",
"let": {
"products": "$products"
},
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{
"$in": [
"$productId",
["$$products.productId"]
]
},
{
"$eq": [
"$orderType",
"Cash"
]
}
]
}
}
}
],
"as": "orders"
}
},
{ $unwind: 'orders' },
{
$group: {
_id: "$_id",
name: { $first: "$name" },
userId: { $first: "$userId" },
products: { $addToSet: "$products" },
orders: { $addToSet: "$orders" },
score: { $first: "$score" },
}
},
{ $sort: { "score": -1 } }
]
);
Issue:
Every lookup produces an array which holds the matched documents. When no documents are found, the array would be empty. Unwinding that empty array would break the pipeline immediately. That's the reason, we are not getting user records with no products/orders. We would need to preserve such arrays so that the pipeline execution can continue.
Improvements:
In orders lookup, the $eq can be used instead of $in, as we already
unwinded the products array and each document now contains only
single productId
Create an index on userId in products collection to make the query more efficient
Following is the updated query:
db.users.aggregate([
{
"$match": {
"$text": {
"$search": "john"
}
}
},
{
"$addFields": {
"score": {
"$meta": "textScore"
}
}
},
{
"$skip": 0
},
{
"$limit": 6
},
{
"$lookup": {
"from": "products",
"localField": "userId",
"foreignField": "userId",
"as": "products"
}
},
{
$unwind: {
"path":"$products",
"preserveNullAndEmptyArrays":true
}
},
{
"$lookup": {
"from": "orders",
"let": {
"products": "$products"
},
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{
"$eq": [
"$productId",
"$$products.productId"
]
},
{
"$eq": [
"$orderType",
"Cash"
]
}
]
}
}
}
],
"as": "orders"
}
},
{
$unwind: {
"path":"$orders"
"preserveNullAndEmptyArrays":true
}
},
{
$group: {
_id: "$_id",
name: {
$first: "$name"
},
userId: {
$first: "$userId"
},
products: {
$addToSet: "$products"
},
orders: {
$addToSet: "$orders"
},
score: {
$first: "$score"
}
}
},
{
$sort: {
"score": -1
}
}
]);
To get more information on unwind, please check https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/

mongoose populate pass found object reference key

I have a mongoose Group schema which contains invitee (array of sub document) and currentMove, invitee also contains currentMove and I want to get document with only sub document that have same currentMove.
Group.findById("5a03fa29fafa645c8a399353")
.populate({
path: 'invitee.user_id',
select: 'currentMove',
model:"User",
match: {
"currentMove":{
$eq: "$currentMove"
}
}
})
This generates unknown currentMove Object id for match query. I'm not sure if mongoose has this functionality. Can anyone help me, please?
In modern MongoDB releases it is far more efficient to use $lookup here instead of .populate(). Also the basic concept that you want to filter based on a comparison of fields is something that MongoDB does quite well with native operators, but it's not something you can easily transpose into .populate().
In fact the only way possible to actually use with .populate() would be to first retrieve all results, and then use Model.populate() with a $where clause on query all whilst processing the result array with Array.map() in order to apply the local value of each document to the conditions to "join" on.
It's all kind of messy, and involves pulling all results from the server and filtering locally. So $lookup is our best option here, where all of the "filtering" and "matching" actually takes place on the server without needing to pull unnecessary documents over the network just to obtain a result.
Sample Schema
You don't actually include a "schema" in your question, so we can only work with an approximation based on what parts you actually do include in the question. So my example here uses:
const userSchema = new Schema({
name: String,
currentMove: Number
})
const groupSchema = new Schema({
name: String,
topic: String,
currentMove: Number,
invitee: [{
user_id: { type: Schema.Types.ObjectId, ref: 'User' },
confirmed: { type: Boolean, default: false }
}]
});
Unwinding $lookup and $group
From here we have different approaches to the $lookup queries. The first basically involves applying $unwind both before and after the $lookup stage. This is partly since your "reference" is an embedded field within the array, and also partly because it's actually the most efficient query form to use here with a possible "join" result that could potentially exceed the BSON limit ( 16MB for the document ) being avoided:
Group.aggregate([
{ "$unwind": "$invitee" },
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "invitee.user_id"
}},
{ "$unwind": "$invitee.user_id" },
{ "$redact": {
"$cond": {
"if": { "$eq": ["$currentMove", "$invitee.user_id.currentMove"] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"topic": { "$first": "$topic" },
"currentMove": { "$first": "$currentMove" },
"invitee": { "$push": "$invitee" }
}}
]);
The key expression here is the $redact which is processed after the $lookup result is returned. This allows a logical comparison of the "currentMove" values from both the parent document and the "joined" detail for the User objects.
Since we $unwind the array content, we use $group with $push to reconstruct the array ( if you must ) and select the other fields of the original document using $first.
There are ways to examine the schema and generate such a stage, but that's not really in the scope of the question. An example can be seen on Querying after populate in Mongoose. Point being that if you want the fields returned, then you would construct this pipeline stage around using those expressions to return a document of the original shape.
Filter $lookup result
An alternate approach where you are certain that the "unfiltered" result of the "join" will not cause the document to exceed the BSON limit is to instead make a separate target array, and then reconstruct your "joined" array content using $map and $filter, and other array operators:
Group.aggregate([
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "inviteeT"
}},
{ "$addFields": {
"invitee": {
"$map": {
"input": {
"$filter": {
"input": "$inviteeT",
"as": "i",
"cond": { "$eq": ["$$i.currentMove","$currentMove"] }
}
},
"as": "i",
"in": {
"_id": {
"$arrayElemAt": [
"$invitee._id",
{ "$indexOfArray": ["$invitee.user_id", "$$i._id"] }
]
},
"user_id": "$$i",
"confirmed": {
"$arrayElemAt": [
"$invitee.confirmed",
{ "$indexOfArray": ["$invitee.user_id","$$i._id"] }
]
}
}
}
}
}},
{ "$project": { "inviteeT": 0 } },
{ "$match": { "invitee.0": { "$exists": true } } }
]);
Instead of the $redact which would be filtering "documents", we use $filter here with the expression to only return those members of the target array "inviteeT" which share the same "currentMove". Since this is just the "foreign" content, we "join" with the original array using $map and transposing the elements.
To do that "transposition" of values from the original array, we use the $arrayElemAt and $indexOfArray expressions. The $indexOfArray allows us to match up the target's "_id" values with the "user_id" values in the original array and get it's "index" position. We always know this returns a real match because the $lookup did that part for us.
The "index" value is then supplied to $arrayElemAt which similarly applies a "mapping" of the values as an array like "$invitee.confirmed" and returns the value matched at the same index. This is basically a "lookup" between the arrays.
Differing from the first pipeline example, we now still have the "inviteeT" array as well as our re-written "invitee" array courtesy of $addFields. So one way to get rid of that is to add an additional $project and exclude the unwanted "temporary" array. And of course since we did not $unwind and "filter", there are still possible results with no matching array entries at all. So the $match expression uses $exists to test for the 0 index being present in the array result, which means there is "at least one" result, and discards any documents with empty arrays.
MongoDB 3.6 "sub-query"
MongoDB 3.6 makes this a bit cleaner as a new syntax for $lookup allows a more expressive "pipeline" to be given in argument to select the results returned, rather than the simplistic "localField" and "foreignField" matching.
Group.aggregate([
{ "$lookup": {
"from": User.collection.name,
"let": {
"ids": "$invitee._id",
"users": "$invitee.user_id",
"confirmed": "$invitee.confirmed",
"currentMove": "$currentMove"
},
"pipeline": [
{ "$match": {
"$expr": {
"$and": [
{ "$in": ["$_id", "$$users"] },
{ "$eq": ["$currentMove", "$$currentMove"] }
]
}
}},
{ "$project": {
"_id": {
"$arrayElemAt": [
"$$ids",
{ "$indexOfArray": ["$$users", "$_id"] }
]
},
"user_id": "$$ROOT",
"confirmed": {
"$arrayElemAt": [
"$$confirmed",
{ "$indexOfArray": ["$$users", "$_id"] }
]
}
}}
],
"as": "invitee"
}},
{ "$match": { "invitee.0": { "$exists": true } } }
])
So there are some slightly "glitchy" things in there with the usage of mapping arrays of specific values for input due to how these are currently passed into the sub-pipeline via the "let" declaration. This should probably work cleaner, but on the current release candidate this is how it's actually required to be expressed in order to work.
With this new syntax the "let" allows us to declare "variables" from the current document which can then be referenced in the "pipeline" expression which will be executed in order to determine which results to return to the target array.
The $expr here essentially replaces the $redact or $filter conditions used before, as well as combining the "local" to "foreign" key matching which also requires us to declare such a variable. Here we mapped the "$invitee.user_id" values from the source document into a variable which we refer to as "$$users" in the rest of the expressions.
The $in operator here is a variant for the aggregation framework which returns a boolean condition where the first argument "value" is found in the second argument "array". So this is the "foreign key" filter part.
Since this is a "pipeline", we can add a $project stage in addition to the $match which selected the items from the foreign collection. So again we use a similar "transposition" technique to what was described before. This then gives us control of the "shape" of the documents returned in the array, so we don't manipulate the returned array "after" the $lookup like we did previously.
The same case applies though, since no matter what you do here the "sub-pipeline" can of course return no results when the filter conditions do not match. So again the same $exists test is used to discard those documents.
So it's all pretty cool, and once you get used to the power available in the server side "join" functionality of $lookup you likely will never look back. Whilst the syntax is a lot more terse than the "convenience" function that .populate() was introduced for, the reduced traffic load, far more advanced uses and general expressiveness basically make up for that.
As a complete example, I'm also including a self contained listing that demonstrates all of these. And if you run it with a MongoDB 3.6 compatible server attached, then you will even get that demonstration as well.
Needs a recent Node.js v8.x release to run with async/await ( or enable in other supported ), but since that's now the LTS release you really should be running that anyway. At least install one to test :)
const mongoose = require('mongoose'),
Schema = mongoose.Schema;
mongoose.Promise = global.Promise;
mongoose.set('debug',true);
const uri = 'mongodb://localhost/rollgroup',
options = { useMongoClient: true };
const userSchema = new Schema({
name: String,
currentMove: Number
})
const groupSchema = new Schema({
name: String,
topic: String,
currentMove: Number,
invitee: [{
user_id: { type: Schema.Types.ObjectId, ref: 'User' },
confirmed: { type: Boolean, default: false }
}]
});
const User = mongoose.model('User', userSchema);
const Group = mongoose.model('Group', groupSchema);
function log(data) {
console.log(JSON.stringify(data, undefined, 2))
}
(async function() {
try {
const conn = await mongoose.connect(uri,options);
let { version } = await conn.db.admin().command({'buildInfo': 1});
// Clean data
await Promise.all(
Object.entries(conn.models).map(([k,m]) => m.remove() )
);
// Add some users
let users = await User.insertMany([
{ name: 'Bill', currentMove: 1 },
{ name: 'Ted', currentMove: 2 },
{ name: 'Fred', currentMove: 3 },
{ name: 'Sally', currentMove: 4 },
{ name: 'Harry', currentMove: 5 }
]);
await Group.create({
name: 'Group1',
topic: 'This stuff',
currentMove: 3,
invitee: users.map( u =>
({ user_id: u._id, confirmed: (u.currentMove === 3) })
)
});
await (async function() {
console.log('Unwinding example');
let result = await Group.aggregate([
{ "$unwind": "$invitee" },
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "invitee.user_id"
}},
{ "$unwind": "$invitee.user_id" },
{ "$redact": {
"$cond": {
"if": { "$eq": ["$currentMove", "$invitee.user_id.currentMove"] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"topic": { "$first": "$topic" },
"currentMove": { "$first": "$currentMove" },
"invitee": { "$push": "$invitee" }
}}
]);
log(result);
})();
await (async function() {
console.log('Using $filter example');
let result = await Group.aggregate([
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "inviteeT"
}},
{ "$addFields": {
"invitee": {
"$map": {
"input": {
"$filter": {
"input": "$inviteeT",
"as": "i",
"cond": { "$eq": ["$$i.currentMove","$currentMove"] }
}
},
"as": "i",
"in": {
"_id": {
"$arrayElemAt": [
"$invitee._id",
{ "$indexOfArray": ["$invitee.user_id", "$$i._id"] }
]
},
"user_id": "$$i",
"confirmed": {
"$arrayElemAt": [
"$invitee.confirmed",
{ "$indexOfArray": ["$invitee.user_id","$$i._id"] }
]
}
}
}
}
}},
{ "$project": { "inviteeT": 0 } },
{ "$match": { "invitee.0": { "$exists": true } } }
]);
log(result);
})();
await (async function() {
if (parseFloat(version.match(/\d\.\d/)[0]) >= 3.6) {
console.log('New $lookup example. Yay!');
let result = await Group.collection.aggregate([
{ "$lookup": {
"from": User.collection.name,
"let": {
"ids": "$invitee._id",
"users": "$invitee.user_id",
"confirmed": "$invitee.confirmed",
"currentMove": "$currentMove"
},
"pipeline": [
{ "$match": {
"$expr": {
"$and": [
{ "$in": ["$_id", "$$users"] },
{ "$eq": ["$currentMove", "$$currentMove"] }
]
}
}},
{ "$project": {
"_id": {
"$arrayElemAt": [
"$$ids",
{ "$indexOfArray": ["$$users", "$_id"] }
]
},
"user_id": "$$ROOT",
"confirmed": {
"$arrayElemAt": [
"$$confirmed",
{ "$indexOfArray": ["$$users", "$_id"] }
]
}
}}
],
"as": "invitee"
}},
{ "$match": { "invitee.0": { "$exists": true } } }
]).toArray();
log(result);
}
})();
await (async function() {
console.log("Horrible populate example :(");
let results = await Group.find();
results = await Promise.all(
results.map( r =>
User.populate(r,{
path: 'invitee.user_id',
match: { "$where": `this.currentMove === ${r.currentMove}` }
})
)
);
console.log("All members still there");
log(results);
// Then we clean it for null values
results = results.map( r =>
Object.assign(r,{
invitee: r.invitee.filter(i => i.user_id !== null)
})
);
console.log("Now they are filtered");
log(results);
})();
} catch(e) {
console.error(e);
} finally {
mongoose.disconnect();
}
})()
Gives the output for each example as:
Mongoose: users.remove({}, {})
Mongoose: groups.remove({}, {})
Mongoose: users.insertMany([ { __v: 0, name: 'Bill', currentMove: 1, _id: 5a0afda01643cf41789e500a }, { __v: 0, name: 'Ted', currentMove: 2, _id: 5a0afda01643cf41789e500b }, { __v: 0, name: 'Fred', currentMove: 3, _id: 5a0afda01643cf41789e500c }, { __v: 0, name: 'Sally', currentMove: 4, _id: 5a0afda01643cf41789e500d }, { __v: 0, name: 'Harry', currentMove: 5, _id: 5a0afda01643cf41789e500e } ], {})
Mongoose: groups.insert({ name: 'Group1', topic: 'This stuff', currentMove: 3, _id: ObjectId("5a0afda01643cf41789e500f"), invitee: [ { user_id: ObjectId("5a0afda01643cf41789e500a"), _id: ObjectId("5a0afda01643cf41789e5014"), confirmed: false }, { user_id: ObjectId("5a0afda01643cf41789e500b"), _id: ObjectId("5a0afda01643cf41789e5013"), confirmed: false }, { user_id: ObjectId("5a0afda01643cf41789e500c"), _id: ObjectId("5a0afda01643cf41789e5012"), confirmed: true }, { user_id: ObjectId("5a0afda01643cf41789e500d"), _id: ObjectId("5a0afda01643cf41789e5011"), confirmed: false }, { user_id: ObjectId("5a0afda01643cf41789e500e"), _id: ObjectId("5a0afda01643cf41789e5010"), confirmed: false } ], __v: 0 })
Unwinding example
Mongoose: groups.aggregate([ { '$unwind': '$invitee' }, { '$lookup': { from: 'users', localField: 'invitee.user_id', foreignField: '_id', as: 'invitee.user_id' } }, { '$unwind': '$invitee.user_id' }, { '$redact': { '$cond': { if: { '$eq': [ '$currentMove', '$invitee.user_id.currentMove' ] }, then: '$$KEEP', else: '$$PRUNE' } } }, { '$group': { _id: '$_id', name: { '$first': '$name' }, topic: { '$first': '$topic' }, currentMove: { '$first': '$currentMove' }, invitee: { '$push': '$invitee' } } } ], {})
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"invitee": [
{
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afda01643cf41789e5012",
"confirmed": true
}
]
}
]
Using $filter example
Mongoose: groups.aggregate([ { '$lookup': { from: 'users', localField: 'invitee.user_id', foreignField: '_id', as: 'inviteeT' } }, { '$addFields': { invitee: { '$map': { input: { '$filter': { input: '$inviteeT', as: 'i', cond: { '$eq': [ '$$i.currentMove', '$currentMove' ] } } }, as: 'i', in: { _id: { '$arrayElemAt': [ '$invitee._id', { '$indexOfArray': [ '$invitee.user_id', '$$i._id' ] } ] }, user_id: '$$i', confirmed: { '$arrayElemAt': [ '$invitee.confirmed', { '$indexOfArray': [ '$invitee.user_id', '$$i._id' ] } ] } } } } } }, { '$project': { inviteeT: 0 } }, { '$match': { 'invitee.0': { '$exists': true } } } ], {})
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"invitee": [
{
"_id": "5a0afda01643cf41789e5012",
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"confirmed": true
}
],
"__v": 0
}
]
New $lookup example. Yay!
Mongoose: groups.aggregate([ { '$lookup': { from: 'users', let: { ids: '$invitee._id', users: '$invitee.user_id', confirmed: '$invitee.confirmed', currentMove: '$currentMove' }, pipeline: [ { '$match': { '$expr': { '$and': [ { '$in': [ '$_id', '$$users' ] }, { '$eq': [ '$currentMove', '$$currentMove' ] } ] } } }, { '$project': { _id: { '$arrayElemAt': [ '$$ids', { '$indexOfArray': [ '$$users', '$_id' ] } ] }, user_id: '$$ROOT', confirmed: { '$arrayElemAt': [ '$$confirmed', { '$indexOfArray': [ '$$users', '$_id' ] } ] } } } ], as: 'invitee' } }, { '$match': { 'invitee.0': { '$exists': true } } } ])
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"invitee": [
{
"_id": "5a0afda01643cf41789e5012",
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"confirmed": true
}
],
"__v": 0
}
]
Horrible populate example :(
Mongoose: groups.find({}, { fields: {} })
Mongoose: users.find({ _id: { '$in': [ ObjectId("5a0afda01643cf41789e500a"), ObjectId("5a0afda01643cf41789e500b"), ObjectId("5a0afda01643cf41789e500c"), ObjectId("5a0afda01643cf41789e500d"), ObjectId("5a0afda01643cf41789e500e") ] }, '$where': 'this.currentMove === 3' }, { fields: {} })
All members still there
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"__v": 0,
"invitee": [
{
"user_id": null,
"_id": "5a0afda01643cf41789e5014",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afda01643cf41789e5013",
"confirmed": false
},
{
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afda01643cf41789e5012",
"confirmed": true
},
{
"user_id": null,
"_id": "5a0afda01643cf41789e5011",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afda01643cf41789e5010",
"confirmed": false
}
]
}
]
Now they are filtered
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"__v": 0,
"invitee": [
{
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afda01643cf41789e5012",
"confirmed": true
}
]
}
]
Using populate()
So using .populate() here is actually pretty horrible. Sure it looks like less, but it's actually doing a lot of things that simply are not needed, and all because the "join" does not happen on the server:
// Note that we cannot populate "here" since we need the returned value
let results = await Group.find();
// The value is only in context as we use `Array.map()` to process each result
results = await Promise.all(
results.map( r =>
User.populate(r,{
path: 'invitee.user_id',
match: { "$where": `this.currentMove === ${r.currentMove}` }
})
)
);
console.log("All members still there");
log(results);
// Then we clean it for null values
results = results.map( r =>
Object.assign(r,{
invitee: r.invitee.filter(i => i.user_id !== null)
})
);
console.log("Now they are filtered");
log(results);
So I also included that in the output above, as well as the whole code listing.
The problem becomes evident as you cannot "chain" the populate directly to the first query. You actually need to return the documents ( potentially ALL of them ) in order to use the current document value in a subsequent populate. And this MUST be processed for each document returned.
Not only that but populate() is NOT going to "filter" the array to only those which match, even with the query condition. All it does is set's the unmatched elements to null:
[
{
"_id": "5a0afa889f9f7e4064d8794d",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"__v": 0,
"invitee": [
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d87952",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d87951",
"confirmed": false
},
{
"user_id": {
"_id": "5a0afa889f9f7e4064d8794a",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afa889f9f7e4064d87950",
"confirmed": true
},
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d8794f",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d8794e",
"confirmed": false
}
]
}
]
This then needs an Array.filter() to be processed again for "each" document returned, which can finally remove the unwanted array items and give you the same result the other aggregation queries are doing.
So it's "really wasteful" and just not a good way to do things. Little point in having a database, when you're actually doing the majority of processing on the server. In fact, we may have well simply returned the populated result and then run an Array.filter() in order to remove the unwanted entries.
This is just not how you write fast and effective code. So the example here is sometimes "what looks simple" is actually doing a lot more damage than good.

How to access the fields from arrays of a object in two different collections?

This is locations collection data.
{
_id: "1",
location: "loc1",
sublocations: [
{
_id: 2,
sublocation: "subloc1",
},
{
_id: 3,
sublocation: "subloc2",
}
]
},
{
_id: "4",
location: "loc2",
sublocations: [
{
_id: 5,
sublocation: "subloc1",
},
{
_id: 6,
sublocation: "subloc2",
}
]
}
This is products collection data
{
_id: "1",
product: "product1",
prices: [
{
_id: 2,
sublocationid: 2, //ObjectId of object in sublocations array
price: 500
},
{
_id: 3,
sublocationid: 5, //ObjectId of object in sublocations array
price: 200
}
]
}
Now I need to get the sublocation in product schema in the prices array. Expected result is as below.
{
_id: "1",
product: "product1",
prices: [
{
_id: 2,
sublocationid: 3,
sublocation: "subloc2",
price: 500
},
{
_id: 3,
sublocationid: 5,
sublocation: "subloc1"
price: 200
}
]
}
To achieve it, I did it like in the following way.
First, performing aggregation on locations collection - $unwind the sublocations array and store the $out in the new collection.
Second, perform aggregation on 'products' collection - $unwind the prices, $lookup the sublocationid from the new collection and $group them.
Third, after getting data delete the data of new collection.
Is there any other simplified way? Please let me know if there is any.
If you want to stick with 3.4 version, you can try this query:
db.products.aggregate([
{
$unwind: {
"path": "$prices"
}
},
{
$lookup: {
"from": "locations",
"localField": "prices.sublocationid",
"foreignField": "sublocations._id",
"as": "locations"
}
},
{
$unwind: {
"path": "$locations"
}
},
{
$unwind: {
"path": "$locations.sublocations"
}
},
{
$addFields: {
"keep": {
"$eq": [
"$prices.sublocationid",
"$locations.sublocations._id"
]
}
}
},
{
$match: {
"keep": true
}
},
{
$addFields: {
"price": {
"_id": "$prices._id",
"sublocationid": "$prices.sublocationid",
"sublocation": "$locations.sublocations.sublocation",
"price": "$prices.price"
}
}
},
{
$group: {
"_id": "$_id",
"product": { "$first": "$product" },
"prices": { "$addToSet": "$price" }
}
}
]);
It's not as nice as 3.6 version though, because of a higher memory consumption.
You can try below aggregation query in 3.6 version.
Since both local field and foreign field are array you have to $unwind both to do equality comparison.
For this you will have to use new $lookup syntax.
$match with $expr provides comparsion between document fields to look up the location's sublocation document for each product's sublocation id.
$project to project the matching sublocation doc.
$addFields with $arrayElemAt to convert the looked up sublocation array into a document.
$group to push all prices with matching sublocation's document for each product.
db.products.aggregate[
{
"$unwind": "$prices"
},
{
"$lookup": {
"from": "locations",
"let": {
"prices": "$prices"
},
"pipeline": [
{
"$unwind": "$sublocations"
},
{
"$match": {
"$expr": [
"$$prices.sublocationid",
"$sublocations._id"
]
}
},
{
"$project": {
"sublocations": 1,
"_id": 0
}
}
],
"as": "prices.sublocations"
}
},
{
"$addFields": {
"prices.sublocations": {
"$arrayElemAt": [
"$prices.sublocations",
0
]
}
}
},
{
"$group": {
"_id": "$_id",
"product": {
"$first": "$product"
},
"prices": {
"$push": "$prices"
}
}
}
])