Mongo $lookup, which way is the fastest? - mongodb

it has been a while since I began using MongoDB aggregation.
It's a great way to perform complex queries and it has improved my app's performance in ways I never thought it was possible.
However, I came across $lookup and it appears that there are 3 ways of performing them. I would like to know what are the the advantages and drawbacks to each of them.
For the below examples, I am starting from collectionA using fieldA to match documents from collectionB using fieldB
What I'd call preset $lookup
{
$lookup: {
from: 'collectionB',
localField: 'fieldA',
foreignField: 'fieldB',
as: 'documentsB'
}
}
What I'd call custom $lookup
{
$lookup: {
from: 'collectionB',
let: { valueA: '$fieldA' },
pipeline: [
{
$match: {
$expr: {
$eq: ['$$valueA', '$fieldB']
}
}
}
],
as: 'documentsB'
}
}
Perfoming a find then an aggregate on collectionB
const docsA = db.collection('collectionA').find({}).toArray();
// Basically I will extract all values possible for the query to docB
const valuesForB = docsA.map((docA) => docA.fieldA);
db.collection('collectionB').aggregate([
{
$match: {
fieldB: { $in: valuesForB }
}
}
]);
I'd like to know which one is the fastest
If there are any parameters that makes one faster than the others
If there are any limitations to one of them
From what I can tell, I found :
find + aggregate faster than preset $lookup which is faster than custom $lookup
But then I wonder why custom $lookup exists...

If data is too large than the preset lookup will be faster.
why
All the data is looked up at the database level the data is to be held in another variable.
While in find and aggregate will take longer as data is larger and while aggregating you are just increasing the data.
TIP
If you want to use find and aggregate than you should see the distinct query of MongoDB.
Example
var arr = db.collection('collectionA').distinct('fieldA',{});
db.collection('collectionB').aggregate([
{
$match: {
fieldB: { $in: arr}
}
}
]);

Related

Why MongoDb sort is slow with lookup collections

I have two collections in my mongodb database as follows:
employee_details with approximately 330000 documents which has department_id as a reference from departments collection
departments collections with 2 fields _id and dept_name
I want to join the above two collections using department_id as foreign key by using lookup method. Join works fine but the mongo query execution takes long time when I add sort.
Note: The execution is fast If I remove the sort object or If I remove the lookup method.
I have referred several posts in different blogs and SO, but none of them give a solution with sort.
My query is given below:
db.getCollection("employee_details").aggregate([
{
$lookup: {
from: "departments",
localField: "department_id",
foreignField: "_id",
as: "Department"
}
},
{ $unwind: { path: "$Department", preserveNullAndEmptyArrays: true } },
{ $sort: { employee_fname: -1 } },
{ $limit: 10 }
]);
Can someone give a method to make the above query to work without delay, as my client cannot compromise with the performance delay. I hope there is some method to fix the performance issue as nosql is intented to handle large database.
Any indexing methods is available there? so that I can use it along with my same collection structure.
Thanks in advance.
Currently lookup will be made for every employee_details which means for 330000 times, but if we first sort and limit before lookup, it will be only 10 times. This will greatly decrease query time.
db.getCollection('employee_details').aggregate([
{$sort : {employee_fname: -1}},
{$limit :10},
{
$lookup : {
from : "departments",
localField : "department_id",
foreignField : "_id",
as : "Department"
}
},
{ $unwind : { path: "$Department", preserveNullAndEmptyArrays: true }},
])
After trying this, if you even want to decrease the response time you can define an index on the sort field.
db.employee_details.createIndex( { employee_fname: -1 } )

MongoDB purposely return only users that have no matching $lookup results

I have a users schema and a votes schema. I'm trying to return only users who haven't voted (have no returned votes).
I found this answer and using $lookup I have the below code to find each user and return all their votes as well. Which is halfway to what I'm trying to achieve.
How would I build a query so it only returns a user if they have no votes?
db.users.aggregate([
{
$addFields: { "_id": { "$toString": "$_id" } }
},
{
$lookup:
{
from: "votes",
localField: "_id",
foreignField: "voterId",
as: "votes"
}
}
])
Another question once I have a working solution, how would I go about scaling this up? Running this query in Robo 3T takes 9.05 seconds already for just loading 50 users and I have almost 40,000 users and over 200,000 votes in my database (which will only grow). Is there a more efficient way to do this? The final code will run on a Node.js server.
Update
As silencedogood said in a deleted answer, I don't need to use $addFields because user._id is automatically converted to a string (I thought it would be an ObjectId() initially). This however only saves 1 second off of loading 50 users (8.14s).
db.users.aggregate([
{
$lookup:
{
from: "votes",
localField: "_id",
foreignField: "voterId",
as: "votes"
}
}
])
I still need to figure out how to only return users who haven't voted.
An example shot of your data, and expected result, would help. The $addFields function is likely what is killing your performance. Why do you need this?
If the voterId is formatted as a string in the voter collection, but an objectId in the user collection (which I'm guessing is the case), you'll need to permanently cast to objectId if you want maximum performance. Nonetheless, this is roughly what you're looking for:
db.users.aggregate([
{
$lookup:
{
from: "votes",
localField: "_id",
foreignField: "voterId",
as: "votes"
}
},
{ "$match": { "votes.0": { "$exists": false } } }
])
This alone will only return users who don't have a vote entry. The equivalent of a left join, essentially.
Update
Since they are both strings, you can disregard that aspect of the answer. As to your performance issue... Not sure at the moment. That seems very unrealistic, I've never experienced query times that lengthy with a simple $lookup.

MongoDB aggregate ID's efficiently for bulk searches?

I have more than 8 references in a MongoDB document. Those are Object ID's stored in the origin document and in order to get the real data of the foreign I have to make an aggregation query, something like this:
{
$lookup: {
from: "departments",
let: { "department": "$_department" },
pipeline: [
{ $match: { $expr: { $eq: ["$_id", "$$department"] }}},
],
as: "department"
}
},
{
$unwind: { "path": "$department", "preserveNullAndEmptyArrays": true }
},
That is working and instead of ObjectId I got the real department object.
However this takes time and make the finding queries to take lot of time.
I have noticed that I have the same ID's multiple times and it's better to collect all of the unique ID's and just fetch them once from DB and then just reuse the same object.
I don't know any plugin or a service doing so, using MongoDB. I can make one bymyself I just want to know before I work on something like this, if there any kind of a service or package in Github?

MongoDB — How to sort lookup based on matching field?

In my Mongo database, I have two collections — speakers and speeches, and I'm able to "join" them (getting a speaker's speeches) using the following code block:
// I've already found the 'speaker' in the database
db.collection('speakers').aggregate([{
$match: { "uuid": speaker.uuid } },
{ $lookup: {
from: "speeches",
localField: "uuid",
foreignField: "speaker_id",
as: "speeches" } }
]).toArray();
It works well, but I'd like to sort the speeches array by a date field or a title field, but nothing I do seems to make it happen. None of the examples I've seen here have done what I need them to do. I've tried adding { $sort: { "speech.title": -1 } } after the $lookup block, but it did nothing. Is this even possible?
You can use below $lookup pipeline variant available from 3.6.
{"$lookup":{
"from":"speeches",
"let":{"uuid":"$uuid"},
"pipeline":[
{"$match":{"$expr":{"$eq":["$$uuid","$speaker_id"]}}},
{"$sort":{"title":1}}
],
"as":"speeches"
}}

How can I compare two fields in diffrent two collections in mongodb?

I am beginner in the MongoDB.
Right now, I am making one query by using mongo. Please look this and let me know is it possible? If it is possible, how can I do?
collection:students
[{id:a, name:a-name}, {id:b, name:b-name}, {id:c, name:c-name}]
collection:school
[{
name:schoolA,
students:[a,b,c]
}]
collection:room
[{
name:roomA,
students:[c,a]
}]
Expected result for roomA
{
name:roomA,
students:[
{id:a name:a-name isRoom:YES},
{id:b name:b-name isRoom:NO},
{id:c name:c-name isRoom:YES}
]
}
Not sure about the isRoom property, but to perform a join across collections, you'd have two basic options:
code it yourself, with multiple queries
use the aggregation pipeline with $lookup operator
As a quick example of $lookup, you can take a given room, unwind its students array (meaning separate out each student element into its own entity), and then look up the corresponding student id in the student collection.
Assuming a slight tweak to your room collection document:
[{
name:"roomA",
students:[ {studentId: "c"}, {studentId: "a"}]
}]
Something like:
db.room.aggregate([
{
$unwind: "$students"
},
{
$lookup:
{
from: "students",
localField: "studentid",
foreignField: "id",
as: "classroomStudents"
}
},
{
$project:
{ _id: 0, name : 1 , classroomStudents : 1 }
}
])
That would yield something like:
{
name:"roomA",
classroomStudents: [
{id:"a", name:"a-name"},
{id:"c", name:"c-name"}
]
}
Disclaimer: I haven't actually run this aggregation, so there may be a few slight issues. Just trying to give you an idea of how you'd go about solving this with $lookup.
More info on $lookup is here.