MongoDB: slow performance pipeline lookup compared to basic lookup - mongodb

I have two collection:
matches:
[{
date: "2020-02-15T17:00:00Z",
players: [
{_id: "5efd9485aba4e3d01942a2ce"},
{_id: "5efd9485aba4e3d01942a2cf"}
]
},
{...}]
and players:
[{
_id: "5efd9485aba4e3d01942a2ce",
name: "Rafa Nadal"
},
{
_id: "5efd9485aba4e3d01942a2ce",
name: "Roger Federer"
},
{...}]
I need to use lookup pipeline because I'm building a graphql resolver with recursive functions and I need nested lookup. I've followed this example https://docs.mongodb.com/datalake/reference/pipeline/lookup-stage#nested-example
My problem is that with pipeline lookup I need 11 seconds but with basic lookup only 0.67 seconds. And my test database is very short! about 1300 players and 700 matches.
This is my pipeline lookup (11 seconds to resolve)
db.collection('matches').aggregate([{
$lookup: {
from: 'players',
let: { ids: '$players' },
pipeline: [{ $match: { $expr: { $in: ['$_id', '$$ids' ] } } }],
as: 'players'
}
}]);
And this my basic lookup (0.67 seconds to resolve)
db.collection('matches').aggregate([{
$lookup: {
from: "players",
localField: "players",
foreignField: "_id",
as: "players"
}
}]);
Why so much difference? In what way can I do faster pipeline lookup?

The thing is that when you do a lookup using pipeline with a match stage, then the index would be used only for the fields that are matched with $eq operator and for the rest index will not be used.
And the example you specified with pipeline will work like this ( again index will not be used here as it is not $eq )
db.matches.aggregate([
{
$lookup: {
from: "players",
let: {
ids: {
$map: {
input: "$players",
in: "$$this._id"
}
}
},
pipeline: [
{
$match: {
$expr: {
$in: [
"$_id",
"$$ids"
]
}
}
}
],
as: "players"
}
}
])
As players is an array of object so it need to be mapped to array of ids first
MongoDB Playground

As #namar sood comments there are several tickets that refer to this issue:
https://jira.mongodb.org/browse/SERVER-37470
https://jira.mongodb.org/browse/SERVER-32549
Meanwhile a solution could be (also works nested):
db.collection('matches').aggregate([
{ $unwind: '$players' },
{
$lookup: {
from: 'players',
let: { id: '$players' },
pipeline: [{ $match: { $expr: { $eq: ['$_id', '$$id' ] } } }],
as: 'players'
},
{ $unwind: '$players' },
{
$group: {
"_id": "$_id",
"data": { "$first": "$$ROOT" },
"players": {$push: "$players"}
}
},
{ $addFields: {"data.players": "$players"} },
{ $replaceRoot: { newRoot: "$data" }}
]);

Related

Change element name from the result set of Mongo DB Query

I have collection like below named as "FormData",
{
"_id": ObjectId("5e3c27bf1ef77236945ef07b"),
"eed12747-0923-4290-b09c-5a05107f5609": "20200206",
"bd637691-782d-4cfd-8624-feeedfe11b3e": "20200206_1#mail.com"
}
I have another collection named as "Form" which will have Title of Fields,
{
"_id": ObjectId("5e3c27bf1ef77236945ef07b"),
"Fields":[
{
"FieldID": "eed12747-0923-4290-b09c-5a05107f5609",
"Title": "Phone"
},
{
"FieldID": "bd637691-782d-4cfd-8624-feeedfe11b3e",
"Title": "Email"
}]
}
Now I have to map element name with Form field title and I need result like below,
{
"_id": ObjectId("5e3c27bf1ef77236945ef07b"),
"Phone": "20200206",
"Email": "20200206_1#mail.com"
}
Please help me to solve this.
Thanks in advance!
You can:
$objectToArray to convert the $$ROOT document into an array of k-v pairs for future lookups
use a sub-pipeline in $lookup to find the value by the uuid
use $mergeObject to combine the original values(i.e. "20200206"...) with the new field name looked up (i.e. "Phone"...)
wrangle the result back into original form using $arrayToObject and $replaceRoot
db.FormData.aggregate([
{
$match: {
"_id": ObjectId("5e3c27bf1ef77236945ef07b")
}
},
{
$project: {
kv: {
"$objectToArray": "$$ROOT"
}
}
},
{
$unwind: "$kv"
},
{
"$lookup": {
"from": "Form",
"let": {
uuid: "$kv.k"
},
"pipeline": [
{
$match: {
"_id": ObjectId("5e3c27bf1ef77236945ef07b")
}
},
{
"$unwind": "$Fields"
},
{
$match: {
$expr: {
$eq: [
"$$uuid",
"$Fields.FieldID"
]
}
}
},
{
$project: {
_id: false,
k: "$Fields.Title"
}
}
],
"as": "formLookup"
}
},
{
$unwind: "$formLookup"
},
{
$project: {
kv: {
"$mergeObjects": [
"$kv",
"$formLookup"
]
}
}
},
{
$group: {
_id: "$_id",
kv: {
$push: "$kv"
}
}
},
{
"$project": {
newDoc: {
"$arrayToObject": "$kv"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"_id": "$_id"
},
"$newDoc"
]
}
}
}
])
Mongo Playground
Another option is to start from Form collection and avoid $unwind:
$match and $lookup to get all needed data into one document
$objectToArray to get known keys for FormData
Match the items using $indexOfArray and $arrayElemAt and merge them using $mergeObjects. Then use arrayToObject to format the response
db.Form.aggregate([
{$match: {_id: ObjectId("5e3c27bf1ef77236945ef07b")}},
{$lookup: {
from: "FormData",
localField: "_id",
foreignField: "_id",
as: "formLookup",
pipeline: [{$project: {_id: 0}}]
}},
{$set: {formLookup: {$objectToArray: {$first: "$formLookup"}}}},
{$replaceRoot: {
newRoot: {
$mergeObjects: [
{$arrayToObject: {
$map: {
input: "$formLookup",
in: {$mergeObjects: [
{v: "$$this.v"},
{k: {$getField: {
input: {$arrayElemAt: [
"$Fields",
{$indexOfArray: ["$Fields.FieldID", "$$this.k"]}
]},
field: "Title"
}}}
]}
}
}},
{_id: "$_id"}
]
}
}}
])
See how it works on the playground example

Any stragegies to improve mongodb queries that are slow when using advaced lookups?

I've built 3 queries and I am trying to optimize one of them because in my opinion is the best way to do it. Unfortunately, I don't get the performance I expected.
Fastest query:
Simple lookups only, but you need to unset a lot of data in the last stage. Imagine having to do that for 20 or 30 fields you do not want. If you use $project you will mess up with the todos collection data, so unset is the only way to do it.
Execution time for my set of data is: 176ms with my data set offcourse
db.todos
.aggregate([
{
$lookup: {
from: 'users',
localField: 'user_id',
foreignField: '_id',
as: 'user',
},
},
{
$unwind: {
path: '$user',
preserveNullAndEmptyArrays: true,
},
},
{
$lookup: {
from: 'user_data',
localField: 'user.data_id',
foreignField: '_id',
as: 'user.data',
},
},
{
$unwind: {
path: '$user.data',
preserveNullAndEmptyArrays: true,
},
},
{ $unset: ['user._id', 'user.data_id', 'user.deleted', 'user.active', 'user.data._id'] },
])
.explain();
My Favorite:
Advanced nested lookups. I like it cause it is clean and clear but I don't think this one uses indexes probably because $expr is used even if it is done on _id.
Execution time for my set of data is: 713ms
db.todos
.aggregate([
{
$lookup: {
from: 'users',
let: { user_id: '$user_id' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$user_id'] } } },
{
$lookup: {
from: 'user_data',
let: { data_id: '$data_id' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$data_id'] } } },
{ $project: { _id: 0, name: 1 } },
],
as: 'data',
},
},
{ $unwind: '$data' },
{ $project: { _id: 0, email: 1, data: 1 } },
],
as: 'user',
},
},
{
$unwind: {
path: '$user',
preserveNullAndEmptyArrays: true,
},
},
])
.explain();
Worst one:
Advanced lookups(not nested). Slowest query.
Execution time for my set of data is: 777ms
db.todos
.aggregate([
{
$lookup: {
from: 'users',
let: { user_id: '$user_id' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$user_id'] } } },
{ $project: { _id: 0, email: 1, data_id: 1 } },
],
as: 'user',
},
},
{
$unwind: {
path: '$user',
preserveNullAndEmptyArrays: true,
},
},
{
$lookup: {
from: 'user_data',
let: { data_id: '$user.data_id' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$data_id'] } } },
{ $project: { _id: 0, name: 1 } },
],
as: 'user.data',
},
},
{
$unwind: {
path: '$user.data',
preserveNullAndEmptyArrays: true,
},
},
{ $unset: ['user.data_id'] },
])
.explain();
I was surprised (being a Mongo beginner) that the way I thought about building queries is not the most effective way. Is there a way to improve the second option or am I stuck with the first one?
A todo
{"_id":{"$oid":"612156ec810895cfe9f406bd"},"user_id":{"$oid":"61214827810895cfe9f406ac"},"title":"todo 1"}
A user
{"_id":{"$oid":"61216d36810895cfe9f40713"},"active":true,"deleted":false,"data_id":{"$oid":"61216d42810895cfe9f40716"},"email":"test2#test2.com"}
A user data
{"_id":{"$oid":"6121488f810895cfe9f406ad"},"name":{"first":"Fanny","last":"Lenny"}}
Thank you

how to reduce unnecessary unwind stages from aggregation pipeline

Like if i'm applying many lookup stages in aggregation pipeline and each lookup is followed by an unwind(just to covert into object) first question does it affect query performance? and if yes how to do that in optimised manner
Note: all lookup's will return only one object
For Ex:
xyz.aggregate([
{ $lookup:{ ----}} //first lookup
{$unwind :{----}} //first unwind
{ $lookup:{ ----}} //second lookup
{$unwind :{----}} //second unwind
{ $lookup:{ ----}} //third lookup
{$unwind :{----}} //third unwind
{ $lookup:{ ----}} //fourth lookup
{$unwind :{----}} //fourth unwind
])
In reference to comments, here is advanced $lookup:
$lookup: {
from: 'accounts',
let: { "localAccountField": "$account" },
pipeline: [
{
$match: {
$expr: {
$eq: ["$_id", "$$localAccountField"]
}
}
},
{
$project: {
_id: 1,
user: 1
}
},
{
$lookup: {
from: 'users',
let: { 'localUserField': "$user" },
pipeline: [
{
$match: {
$expr: {
$eq: ["$_id", "$$localUserField"]
}
}
},
{
$project: {
_id: 1,
username: "$uid",
phone:"$phoneNumber",
email: "$email.add",
name: {
$concat: [
"$profile.name.first",
' ',
"$profile.name.last"
]
},
}
}
],
as: "users"
}
},
{
$lookup: {
from: 'documents',
let: { 'localDocumentField': "$user" },
pipeline: [
{
$match: {
$expr: {
$eq: ["$user", "$$localDocumentField"]
},
status:"verified",
"properties.expirydate": { $exists: true, $ne: "" },
name: "idcard"
}
},
{
$project: {
_id: 0,
cnic: "$properties.number"
}
}
],
as: "documents"
}
}
],
as: 'account'
}

$lookup when foreignField is in nested array

I have two collections :
Student
{
_id: ObjectId("657..."),
name:'abc'
},
{
_id: ObjectId("593..."),
name:'xyz'
}
Library
{
_id: ObjectId("987..."),
book_name:'book1',
issued_to: [
{
student: ObjectId("657...")
},
{
student: ObjectId("658...")
}
]
},
{
_id: ObjectId("898..."),
book_name:'book2',
issued_to: [
{
student: ObjectId("593...")
},
{
student: ObjectId("594...")
}
]
}
I want to make a Join to Student collection that exists in issued_to array of object field in Library collection.
I would like to make a query to student collection to get the student data as well as in library collection, that will check in issued_to array if the student exists or not if exists then get the library document otherwise not.
I have tried $lookup of mongo 3.6 but I didn`t succeed.
db.student.aggregate([{$match:{_id: ObjectId("593...")}}, $lookup: {from: 'library', let: {stu_id:'$_id'}, pipeline:[$match:{$expr: {$and:[{"$hotlist.clientEngagement": "$$stu_id"]}}]}])
But it thorws error please help me in regard of this. I also looked at other questions asked at stackoverflow like. question on stackoverflow,
question2 on stackoverflow but these are comapring simple fields not array of objects. please help me
I am not sure I understand your question entirely but this should help you:
db.student.aggregate([{
$match: { _id: ObjectId("657...") }
}, {
$lookup: {
from: 'library',
localField: '_id' ,
foreignField: 'issued_to.student',
as: 'result'
}
}])
If you want to only get the all book_names for each student you can do this:
db.student.aggregate([{
$match: { _id: ObjectId("657657657657657657657657") }
}, {
$lookup: {
from: 'library',
let: { 'stu_id': '$_id' },
pipeline: [{
$unwind: '$issued_to' // $expr cannot digest arrays so we need to unwind which hurts performance...
}, {
$match: { $expr: { $eq: [ '$issued_to.student', '$$stu_id' ] } }
}, {
$project: { _id: 0, "book_name": 1 } // only include the book_name field
}],
as: 'result'
}
}])
This might not be a very good answer, but if you can change your schema of Library to:
{
_id: ObjectId("987..."),
book_name:'book1'
issued_to: [
ObjectId("657..."),
ObjectId("658...")
]
},
{
_id: "ObjectId("898...")",
book_name:'book2'
issued_to: [
ObjectId("593...")
ObjectId("594...")
]
}
Then when you do:
{
$lookup: {
from: 'student',
localField: 'issued_to',
foreignField: '_id',
as: 'issued_to_students', // this creates a new field without overwriting your original 'issued_to'
}
},
You should get, based on your example above:
{
_id: ObjectId("987..."),
book_name:'book1'
issued_to_students: [
{ _id: ObjectId("657..."), name: 'abc', ... },
{ _id: ObjectId("658..."), name: <name of this _id>, ... }
]
},
{
_id: "ObjectId("898...")",
book_name:'book2'
issued_to: [
{ _id: ObjectId("593..."), name: 'xyz', ... },
{ _id: ObjectId("594..."), name: <name of this _id>, ... }
]
}
You need to $unwind the issued_to from library collection to match the issued_to.student with _id
db.student.aggregate([
{ "$match": { "_id": mongoose.Types.ObjectId(id) } },
{ "$lookup": {
"from": Library.collection.name,
"let": { "studentId": "$_id" },
"pipeline": [
{ "$unwind": "$issued_to" },
{ "$match": { "$expr": { "$eq": [ "$issued_to.student", "$$studentId" ] } } }
],
"as": "issued_to"
}}
])

$match in $lookup result

I have next mongo code:
db.users.aggregate([
{
$match: {
$and: [
{ UserName: { $eq: 'administrator' } },
{ 'Company.CompanyName': { $eq: 'test' } }
]
}
},
{
$lookup: {
from: "companies",
localField: "CompanyID",
foreignField: "CompanyID",
as: "Company"
}
},
])
The $lookup part of the code working great. I got next result:
But if I add $match to the code, it brings nothing.
I found that the problem is in the second match: { 'Company.CompanyName': { $eq: 'test' } }, but I can not realize what is wrong with it.
Any ideas?
UPDATE:
I had also tried $unwind on the $lookup result, but no luck:
db.users.aggregate([
{
$match: {
$and: [
{ UserName: { $eq: 'administrator' } },
{ 'Company.CompanyName': { $eq: 'edt5' } }
]
}
},
{ unwind: '$Company' },
{
$lookup: {
from: 'companies',
localField: 'CompanyID',
foreignField: 'CompanyID',
as: 'Company'
}
},
])
With MongoDB 3.4, you can run an aggregation pipeline that uses the $addFields pipeline and a $filter operator to only return the Company array with elements that match the given condition. You can then wrap the $filter expression with the $arrayElemAt operator to return a single document which in essence incorporates the $unwind functionality by flattening the array.
Follow this example to understand the above concept:
db.users.aggregate([
{ "$match": { "UserName": "administrator" } },
{
"$lookup": {
"from": 'companies',
"localField": 'CompanyID',
"foreignField": 'CompanyID',
"as": 'Company'
}
},
{
"$addFields": {
"Company": {
"$arrayElemAt": [
{
"$filter": {
"input": "$Company",
"as": "comp",
"cond": {
"$eq": [ "$$comp.CompanyName", "edt5" ]
}
}
}, 0
]
}
}
}
])
Below answer is for mongoDB 3.6 or later.
Given that:
You have a collection users with a field CompanyID and a collection of companies with a field CompanyID
you want to lookup Companies on Users by matching CompanyID, where additionally:
each User must match condition: User.UserName equals administrator
each Company on User must match condition: CompanyName equals edt5
The following query will work for you:
db.users.aggregate([
{ $match: { UserName: 'administrator' } },
{
$lookup: {
from: 'companies',
as: 'Company',
let: { CompanyID: '$CompanyID' },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$CompanyID', '$$CompanyID'] },
{ $eq: ['$CompanyName', 'edt5'] },
]
}
}
}
]
}
},
])
Explanation:
This is the way to perform left join queries with conditions more complex than simple foreign / local field equality match.
Instead of using localField and foreignField, you use:
let option where you can map local fields to variables,
pipeline option where you can specify aggregation Array.
In pipeline you can use $match filter, with $expr, where you can reuse variables defined earlier in let.
More info on $lookup
Nice tutorial
here is code for fitering array inside lookup.
const userId = req.userData.userId;
const limit = parseInt(req.params.limit);
const page = parseInt(req.params.page);
Collection.aggregate([
{ $match: {} },
{ $sort: { count: -1 } },
{ $skip: limit * page },
{ $limit: limit },
{
$lookup: {
from: Preference.collection.name,
let: { keywordId: "$_id" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$keyword", "$$keywordId"] },
{
$eq: ["$user", mongoose.Types.ObjectId(userId)],
},
],
},
},
},
],
as: "keywordData",
},
},
{
$project: {
_id: 0,
id: "$_id",
count: 1,
for: 1,
against: 1,
created_at: 1,
updated_at: 1,
keyword: 1,
selected: {
$cond: {
if: {
$eq: [{ $size: "$keywordData" }, 0],
},
then: false,
else: true,
},
},
},
}])