Our MongoDB database has a collection of companies and a collection of users. I'm running an aggregation pipeline, the details of which vary based on various filters chosen by an administrator, but below is a simple example. The companies collection has approximately 15 entries, and there are currently around 40,000 entries in the users collection.
Our aggregation returns user data, but must contain the companyname from the companies collection (there may also be filtering on the company depending on what options are chosen). A user document contains an array of subdocuments for reports - we need to include the last report date for possible sorting/filtering, too. The only other non-standard thing is that there is an object field keydata which needs to be included for searching. This object will be different for every user.
With only 40,000 records in the users collection, this aggregation is currently taking approximately five seconds. Eventually, we will have millions of records in the database, so I need a far more performant way of searching records.
Any pointers, please?
[
{
$lookup: {
from: "users",
let: {
cref: "$companyref",
companyname: "$companyname",
},
as: "users",
pipeline: [
{
$match: {
$expr: {
$eq: ["$$cref", "$companyref"],
},
},
},
{
$addFields: {
lastreportdate: {
$arrayElemAt: [
{
$slice: ["$reports.date", -1],
},
0,
],
},
kd: {
$objectToArray: "$keydata",
},
name: {
$concat: ["$firstname", " ", "$lastname"],
},
companyname: "$$companyname",
},
},
{
$match: {
$or: [
{
"kd.v": /searchstring/,
},
{
name: /searchstring/,
},
{
email: /searchstring/,
},
{
userref: /searchstring/,
},
{
username: /searchstring/,
},
],
},
},
{
$project: {
kd: 0,
lastreportdate: 0,
},
},
],
},
},
{
$unwind: {
path: "$users",
preserveNullAndEmptyArrays: false,
},
},
{
$replaceRoot: {
newRoot: "$users",
},
},
{
$project: {
userref: 1,
keydata: 1,
email: 1,
zipcode: 1,
companyref: 1,
companyname: 1,
firstname: 1,
lastname: 1,
name: 1,
reports: 1,
lastlogin: 1,
},
},
]
Collection companies has an index on companyref and collection users has an index on userref.
Related
I have two collections, one being Companies and the others being Projects. I am trying to write an aggregation function that first grabs all Companies with the status of "Client", then from there write a pipeline that will return all filtered Companies where the company._id === project.companyId, as an Array of Objects. An example of the shortened Collections are below:
Companies
{
_id: ObjectId('2341908342'),
companyName: "Meta",
address: "123 Facebook Lane",
status: "Client"
}
Projects
{
_id: ObjectId('234123840'),
companyId: '2341908342',
name: "Test Project",
price: 97450,
}
{
_id: ObjectId('23413456'),
companyId: '2341908342',
name: "Test Project 2",
price: 100000,
}
My desired outcome after the Aggregation:
Companies
{
_id: ObjectId('2341908342'),
companyName: "Meta",
address: "123 Facebook Lane",
projects: [ [Project1], [Project2],
}
The projects field does not currently exist on the Companies collection, so I imagine we would have to add it. I also begun writing a $match function to filter by clients, but I am not sure if this is correct. I am trying to use $lookup for this but can not figure out the pipeline. Can anyone help me?
Where I'm currently stuck:
try {
const allClientsWithProjects = await companyCollection
.aggregate([
{
$match: {
orgId: {
$in: [new ObjectId(req.user.orgId)],
},
status: { $in: ["Client"] },
},
},
{
$addFields: {
projects: [{}],
},
},
{
$lookup: { from: "projects", (I am stuck here) },
},
])
.toArray()
Thank you for any help anyone can provide.
UPDATE*
I am seemingly so close I feel like... This is what I have currently, and it is returning everything but Projects is still an empty array.
try {
const allClients = await companyCollection
.aggregate([
{
$match: {
orgId: {
$in: [new ObjectId(req.user.orgId)],
},
status: {
$in: ["Client"],
},
},
},
{
$lookup: {
from: "projects",
let: {
companyId: {
$toString: [req.user.companyId],
},
},
pipeline: [
{
$match: {
$expr: {
$eq: ["$companyId", "$$companyId"],
},
},
},
],
as: "projects",
},
},
])
.toArray()
All of my company information is being returned correctly for multiple companies, but that projects Array is still []. Any help would be appreciated, and I will still be troubleshooting this.
One option is using a $lookup with a pipeline:
db.company.aggregate([
{
$match: {
_id: {
$in: [
ObjectId("5a934e000102030405000000")
],
},
status: {
$in: [
"Client"
]
},
},
},
{
$lookup: {
from: "Projects",
let: {
companyId: {
$toString: "$_id"
}
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$companyId",
"$$companyId"
]
}
}
}
],
as: "projects"
}
}
])
See how it works on the playground example
Final answer for my question:
try {
const allClientsAndProjects = await companyCollection
.aggregate([
{
$match: {
orgId: {
$in: [new ObjectId(req.user.orgId)],
},
status: {
$in: ["Client"],
},
},
},
{
$lookup: {
from: "projects",
let: {
companyId: {
$toString: "$_id",
},
},
pipeline: [
{
$match: {
$expr: {
$eq: ["$companyId", "$$companyId"],
},
},
},
],
as: "projects",
},
},
])
.toArray()
Hello I have the following collections
const TransactionSchema = mongoose.Schema({
schedule: {
type: mongoose.Schema.ObjectId,
required: true,
ref: "Schedule"
},
uniqueCode: {
type: String,
required: true
},
created: {
type: Date,
default: Date.now
},
status: {
type: String,
required: false
},
})
const ScheduleSchema = mongoose.Schema({
start: {
type: Date,
required: true,
},
end: {
type: Date,
required: false,
},
location: {
type: mongoose.Schema.ObjectId,
required: true,
ref: "Location"
},
})
and I want to return how many times the schedule appear in transaction ( where the status is equal to 'Active') and group it based on its location Id and then lookup the location collection to show the name.
For example I have the following data.
transaction
[
{
"_id":"identifier",
"schedule":identifier1,
"uniqueCode":"312312312312",
"created":"Date",
"status": 'Active'
},
{
"_id":"identifier",
"schedule":identifier1,
"uniqueCode":"1213123123",
"created":"Date",
"status": "Deleted"
}
]
schedule
[
{
"_id":identifier1,
"start":"date",
"end":"date",
"location": id1
},
{
"_id":identifier2,
"start":"date",
"end":"date",
"location": id2
}
]
and I want to get the following result and limit the result by 10 and sort it based on its total value:
[
{
"locationName":id1 name,
"total":1
},
{
"locationName":id2 name,
"total":0
}
]
thank you. Sorry for my bad english.
A bit complex and long query.
$lookup - schedule collection joins with transaction collection by matching:
_id (schedule) with schedule (transaction)
status is Active
and return a transactions array.
$lookup - schedule collection joins with location collection to return location array.
$set - Take the first document in location array so this field would be a document field instead of an array. [This is needed to help further stage]
$group - Group by location._id. And need the fields such as location and total.
$sort - Sort by total DESC.
$limit - Limit to 10 documents to be returned.
$project - Decorate the output documents.
db.schedule.aggregate([
{
$lookup: {
from: "transaction",
let: {
scheduleId: "$_id"
},
pipeline: [
{
$match: {
$expr: {
$and: [
{
$eq: [
"$schedule",
"$$scheduleId"
]
},
{
$eq: [
"$status",
"Active"
]
}
]
}
}
}
],
as: "transactions"
}
},
{
$lookup: {
from: "location",
localField: "location",
foreignField: "_id",
as: "location"
}
},
{
$set: {
location: {
$first: "$location"
}
}
},
{
$group: {
_id: "$location._id",
location: {
$first: "$location"
},
total: {
$sum: {
$size: "$transactions"
}
}
}
},
{
$sort: {
"total": -1
}
},
{
$limit: 10
},
{
$project: {
_id: 0,
locationName: "$location.name",
total: 1
}
}
])
Sample Mongo Playground
I'm trying to do an aggregation on two collections that has a linkage between them, and I need to access information in an array of objects in one of those collections.
Here are the schemas:
User Schema:
{
_id: ObjectId,
username: String,
password: String,
associatedEvents: [
{
event_id: ObjectId,
isCreator: boolean,
access_level: String,
}
]
}
Event Schema:
{
_id: ObjectId,
title: String,
associated_users: [
{
user_id: ObjectId
}
]
}
I'm attempting to get the users associated to an event for a specific user, and then get their access level information. Here's the aggregation I have:
const eventsJoined = await Event.aggregate([
{
$match: {
$expr: { $in: [id, "$associatedUserIds"] },
},
},
{
$lookup: {
from: "users",
localField: "associatedUserIds",
foreignField: "_id",
as: "user_info",
},
},
{ $unwind: "$user_info" },
{
$unwind: {
path: "$user_info.associatedEvents",
preserveNullAndEmptyArrays: true,
},
},
{
$group: {
_id: "$_id",
title: { $first: "$title" },
description: { $first: "$description" },
startDate: { $first: "$startdate" },
userInfo: { $first: "$user_info" },
usersAssociatedEvents: { $push: "$user_info.associatedEvents" },
},
},
{
$project: {
title: 1,
description: 1,
startDate: 1,
userInfo: 1,
usersAssociatedEvents: "$usersAssociatedEvents",
},
},
]);
And this is the result I'm getting:
[
{
_id: 609d5ad1ef4cdbeb32987739,
title: 'hello',
description: 'desc',
startDate: null,
usersAssociatedEvents: [ [Object] ]
}
]
As you can see, the query is already aggregating the correct data. But the last thing that's tripping me up is the fact that the aggregation is [ [Object] ] for usersAssociatedEvents instead of the actual contents of the object. Any idea on why that would be?
I have the following code:
const profiles = await Profile.aggregate([
{
$lookup: {
from: "users",
localField: "user",
foreignField: "_id",
as: "user",
},
},
{
$unwind: "$user",
},
{
$match: {
"user.name": {
$regex: q.trim(),
$options: "i",
},
},
},
{
$skip: req.params.page ? (req.params.page - 1) * 10 : 0,
},
{
$limit: 11,
},
{
$group: {
_id: "$_id",
skills:{skills}
user: { name: "$name" },
user: { avatar: "$avatar" },
},
},
]);
I want to return only specific fields like skills _id and user.name and user.avatar, but this doesn't work. I'm pretty sure that the problem is in $group. I want to receive only these fields
[
{
_id: 5ef78d005d23020ca847aa76,
skills: [ 'asd' ],
user: {
_id: 5ef78c7c5d23020ca847aa75,
name: 'Simeon Lazarov',
avatar: 'uploads\\1593286096227 - background.jpg',
}
}
]
You can make use of $project to get specific fields.
After grouping add the below:
{
$project: {_id:1, skills:1, user:1}
}
Projection value of 0 means that the field needs to be excluded, Value 1 represents inclusion of the field.
Document reference: https://docs.mongodb.com/manual/reference/operator/aggregation/project/
My problem is that I want to do a Lookup of the field "Author" for the array of objects "Reviews", "Followers" and "Watching" but I don't know why it gives me this result in the others arrays, that value repeats the same number of times of the documents in the "Reviews" array.
.unwind({ path: '$reviews', preserveNullAndEmptyArrays: true })
.lookup({
from: 'users',
let: { userId: '$reviews.author' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$userId'] } } },
{
$project: {
name: 1,
username: 1,
photo: 1,
rank: 1,
'premium.status': 1,
online: 1,
},
},
],
as: 'reviews.author',
})
.unwind({ path: '$followers', preserveNullAndEmptyArrays: true })
.lookup({
from: 'users',
let: { userId: '$followers.author' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$userId'] } } },
{
$project: {
name: 1,
username: 1,
photo: 1,
rank: 1,
'premium.status': 1,
online: 1,
},
},
],
as: 'followers.author',
})
.unwind({ path: '$watching', preserveNullAndEmptyArrays: true })
.lookup({
from: 'users',
let: { userId: '$watching.author' },
pipeline: [
{ $match: { $expr: { $eq: ['$_id', '$$userId'] } } },
{
$project: {
name: 1,
username: 1,
photo: 1,
rank: 1,
'premium.status': 1,
online: 1,
},
},
],
as: 'watching.author',
})
.group({
_id: '$_id',
data: {
$first: '$$ROOT',
},
reviews: {
$push: '$reviews',
},
followers: {
$push: '$followers',
},
watching: {
$push: '$watching',
},
})
This is the result when "Reviews" has documents:
The "Followers / Watching" array has nothing in the database but it is shown here in this way, repeating that value the same number of documents that are in reviews, I don't know what happens.
And when all arrays are empty, this happens:
It keeps showing that, but I don't know how to repair it.
In summary, "Reviews", "Watching", and "Followers" have an "Author" field, and I want to do a lookup to the author field of watching, and also for followers and reviews but I have these problems. Please I need help.
Example: This is how it looks in the database:
Thank you very much in advance.
The $unwind stage creates a new document for every element in the array you are unwinding. Each new document will contain a copy of every other field in the document.
If the original document looks like
{
_id: "unique",
Array1:["A","B","C"],
Array2:["D","E","F"],
}
After unwinding "Array1", there will be 3 documents in the pipeline:
[
{
_id: "unique",
Array1:"A",
Array2:["D","E","F"]
},{
_id: "unique",
Array1:"B",
Array2:["D","E","F"]
},{
_id: "unique",
Array1:"C",
Array2:["D","E","F"]
}
]
Then unwinding "watchers" will expand each of the watchers arrays so that you have the cartesian product of the arrays. Playground
In your case, the original document has 2 reviews, but no followers and no watchers, so at the start of the pipeline contains one document, similar to:
[
{
_id: "ObjectId",
data: "other data"
reviews: [{content:"review1", author:"ObjectId"},
{content:"review2", author:"ObjectId"}]
}
]
After the first unwind, you have 2 documents:
[
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review1", author:"ObjectId"}
},
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review2", author:"ObjectId"}
}
]
The first lookup replaces the author ID with data for each document, then the second unwind is applied to each document.
Since that array is empty, the lookup returns an empty array, and the third unwind is applied.
Just before the $group stage, the pipeline contains 2 documents with the arrays:
[
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review1", author:"ObjectId"},
followers: {author: []},
watchers: {author: []}
},
{
_id: "ObjectId",
data: "other data"
reviews: {content:"review2", author:"ObjectId"},
followers: {author:[]},
watchers: {author: []}
}
]
Since both documents have the same _id they are grouped together, with the final result containing the first document as "data".
For the arrays, as each document is encountered, the value of the corresponding field is pushed onto the array, resulting in each array having 2 values.