MongoDB Query Optimization using Multiple $lookup's and $group - mongodb

This is a simplified schema of the database I am working with:
Collection: documents
{
"_Id": "1",
"business": "e.g food",
"relationships": "192",
"components": "ObjectId(34927493..)",
"_Score": "10",
...
}
Collection: components
{
"_Id": "280948304320",
"assessments": "8394",
"relationships": "192",
"results":"ObjectId("82394792343")...."// can be many results
}
Collection: results
{
"_Id": "7978394243",
"state": "severe",
"parentComponent": "ObjectId("28907403")"
"confidence":"0.5"
"category":"Inspection"
}
I have a mongoDB query which is taking 200+ seconds to execute. Here it is below:
db.documents.aggregate([
{$match:
{ "business" : "food"}
},
{
$unwind: "$components"
},
{
$lookup:
{
from: "components",
localField: "components",
foreignField: "_id",
as: "matching_components"
}
},
{
$unwind: "$matching_components"
},
{
$lookup:
{
from: "results",
localField: "components",
foreignField: "parentComponent",
as: "list_results"
}
},
{
$unwind: "$list_results"
},
{$group :
{ _id : '$list_results.state', count : {$sum : 1}}
}
])
I am wondering if there is any way for me to improve the performance of this query. I tried using a group statement in the beginning of the query that groups the documents into their business category, but that did not work as I realized it removes the fields needed for the rest of the query. I indexed for all the fields that I am looking across.
Just to be clear I want to group the documents by their business field. Then I want to map to another collection called components that contains results. After I use another lookup to finally map to the results collection, I want to ultimately count the frequency of each state by business. Currently as you can see, I am using a match in the beginning instead just to see if the query works for one business type. Though the query works, it is taking around 140 seconds.
EDIT: Example Result from this Aggregation:
{
"_id" : State1",
"count" : 90699.0
}
{
"_id" : "State2",
"count" : 448869.0
}
{
"_id" : "State3",
"count" : 71399.0
}
{
"_id" : "State4",
"count" : 513928.0
}
{
"_id" : "State5",
"count" : 765509.0
}

Related

MongoDB aggregate return count of documents or 0

I have the following aggregate query:
db.user.aggregate()
.match({"business_account_id" : ObjectId("5e3377bcb1dbae5124e4b6bf")})
.lookup({
'localField': 'profile_id',
'from': 'profile',
'foreignField' : '_id',
'as': 'profile'
})
.unwind("$profile")
.match({"profile.type" : "consultant"})
.group({_id:"$business_account_id", count:{$sum:1}})
My goal is to count how many consultant users belong to a given company.
Using the query above, if there is at least one user belonging to the provided business_account_id I get a correct count value.
But if there are none users, the .match({"business_account_id" : ObjectId("5e3377bcb1dbae5124e4b6bf")}) will return an empty (0 documents) result.
How can I get a count: 0 if the there are no users assigned to the company ?
I tried many approach based on other threads but I coundn't get a count: 0
UPDATE 1
A simple version of my problem:
user collection
{
"_id" : ObjectId("5e36beb7b1dbae5124e4b6dc"),
"business_account_id" : ObjectId("5e3377bcb1dbae5124e4b6bf"),
},
{
"_id" : ObjectId("5e36d83db1dbae5124e4b732"),
"business_account_id" : ObjectId("5e3377bcb1dbae5124e4b6bf"),
}
Using the following aggregate query:
db.getCollection("user").aggregate([
{ "$match" : {
"business_account_id" : ObjectId("5e3377bcb1dbae5124e4b6bf")
}
},
{ "$group" : {
"_id" : "$business_account_id",
"count" : { "$sum" : 1 }
}
}
]);
I get:
{
"_id" : ObjectId("5e3377bcb1dbae5124e4b6bf"),
"count" : 2
}
But if I query for an ObjectId that doesn't exist, such as:
db.getCollection("user").aggregate([
{ "$match" : {
"business_account_id" : ObjectId("5e335c873e8d40676928656d")
}
},
{ "$group" : {
"_id" : "$business_account_id",
"count" : { "$sum" : 1 }
}
}
]);
I get an result completely empty. I would expect to get:
{
"_id" : ObjectId("5e335c873e8d40676928656d"),
"count" : 0
}
The root of the problem is if there is no document in the user collection that satisfies the initial $match there is nothing to pass to the next stage of the pipeline. If the business_account_id actually exists somewhere (perhaps another collection?) run the aggregation against that collection so that the initial match finds at least one document. Then use $lookup to find the users. If you are using MongoDB 3.6+, you can might combine the user and profile lookups. Lastly, use $size to count the elements in the users array.
(You will probably need to tweak the collection and field names)
db.businesses.aggregate([
{$match:{_id : ObjectId("5e3377bcb1dbae5124e4b6bf")}},
{$project: { _id:1 }},
{$lookup:{
from: "users",
let: {"busId":"$_id"},
as: "users",
pipeline: [
{$match: {$expr:{$eq:[
"$$busId",
"$business_account_id"
]}}},
{$lookup:{
localField: "profile_id",
from: "profile",
foreignField : "_id",
as: "profile"
}},
{$match: { "profile.type" : "consultant"}}
]
}},
{$project: {
_id: 0,
business_account_id: "$_id",
count:{$size:"$users"}
}}
])
Playground
Since you match non-existing business_account_id value, aggregation process will stop.
Workaround: We perform 2 aggregations in parallel with $facet operator to get default value if matching has no result.
Note: Make sure user collection has at least 1 record, otherwise this won't work
db.user.aggregate([
{
$facet: {
not_found: [
{
$project: {
"_id": ObjectId("5e3377bcb1dbae5124e4b6bf"),
"count": { $const: 0 }
}
},
{
$limit: 1
}
],
found: [
{
"$match": {
"business_account_id": ObjectId("5e3377bcb1dbae5124e4b6bf")
}
},
{
"$group": {
"_id": "$business_account_id",
"count": { "$sum": 1 }
}
}
]
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
{
$arrayElemAt: ["$not_found", 0]
},
{
$arrayElemAt: ["$found", 0]
}
]
}
}
}
])
MongoPlayground

MongoDB Aggregation : Double lookup, and merge lookup response to respective object

I'm trying an aggregation but I can't find the right pipeline to do it.
So, this is a part of my document model :
//company.js
{
"_id" : "5dg8aa8c435b1e2868c841f6",
"name" : "My Corp",
"externalId" : "d7f348c9-c69b-69c4-923c-91458c53dc22",
"professionals_customers" : [
{
"company" : "6f4d01eb3b948150c2aad9c0"
},
{
"company" : "5dg7aa8c366b1e2868c841f6",
"contact" : "5df8ab5c355b1e2999c841f7"
}
],
}
I try to return the professionnal customers fields hydrated with data, like a classic populate would do.
Company field came from the company collection and contact is provided by the user collection
The desired output must look like :
{
"professionals_customers" : [
{
"company": {
"_id": "6f4d01eb3b948150c2aad9c0",
"name": "Transtar",
"externalId": "d7f386c9-c79b-49c5-905c-90750c42dc22",
},
},
{
"company": {
"_id": "5dg7aa8c366b1e2868c841f6",
"name": "Aperture",
"externalId": "d7f386c9-c69b-49c4-905c-90750c53dc22",
},
"contact" : {
"_id": "5df8ab5c355b1e2999c841f7",
"firstname": "Caroline",
"lastname": "Glados",
"externalId": "d7f386c9-c69b-49c4-905c-90750c53dc22", //same externalId as above, the user belongs to the company
},
}
]
}
At this point I've tried multiple solutions but I can't reach my goal.
let query = [{
$match : { _id : companyId }
},{
$lookup : {
from: 'companies',
localField : 'professionals_customers.company',
foreignField : '_id',
as : 'professionalsCustomers'
}
},{
$lookup : {
from: 'users',
localField : 'professionals_customers.contact',
foreignField : '_id',
as : 'contacts'
}
}]
At this, point I' ve got two new arrays with all the needed informations, but I don't know how to get the right contact grouped with the right company. Also, maybe it's easier to try to populate the data (with $lookup) keeping the initial struct than trying to regroup professionalCustomers and contacts through the shared externalId.
Additional informations :
-An user that belongs to a company has the same externalId.
-I don't want to use a classical populate, after that, I need to do some other operations
Try this query :
db.companies.aggregate([
{ $match: { _id: companyId } },
{ $unwind: "$professionals_customers" },
{
$lookup: {
from: "companies",
localField: "professionals_customers.company",
foreignField: "_id",
as: "professionals_customers.company"
}
},
{
$lookup: {
from: "users",
localField: "professionals_customers.contact",
foreignField: "_id",
as: "professionals_customers.contact"
}
},
{
$addFields: {
"professionals_customers.company": {
$arrayElemAt: ["$professionals_customers.company", 0]
},
"professionals_customers.contact": {
$arrayElemAt: ["$professionals_customers.contact", 0]
}
}
},
{
$group: { _id: "$_id", professionals_customers: { $push: "$professionals_customers" }, data: { $first: "$$ROOT" } }
},
{ $addFields: { "data.professionals_customers": "$professionals_customers" } },
{ $replaceRoot: { newRoot: "$data" } }
])
Test : MongoDB-Playground
Note : If needed you need to convert fields/input which is of type string to ObjectId(). Basic thing is you need to check types of two fields being compared or input-to-field-in-DB matches or not.

$lookup in MongoDB Provides Unexpected Data Structure Result

I am trying to understand why a $lookup I'm using in my MongoDB aggregation is producing the result it is.
First off, my initial data looks like this:
"subscriptions": [
{
"agency": "3dg2672f145d0598be095634", // This is an ObjectId
"memberType": "primary"
}
]
Now, what I want to do is a simple $lookup, pulling in the related data for the ObjectId that's currently being populated as the value to the "agency" field.
What I tried doing was a $lookup like this:
{
"from" : "agencies",
"localField" : "subscriptions.0.agency",
"foreignField" : "_id",
"as" : "subscriptions.0.agency"
}
So, basically what I want to do is go get that info related to that ObjectId ref, and populate it right here, in place of where the ObjectId currently resides.
What I'd expect as a result is something like this:
"subscriptions": [
{
"agency": [
{
_id: <id-value>,
name: <name-value>,
address: <address-value>
}
],
"memberType": "primary"
}
]
Instead, I end up with this (with my "memberType" prop now nowhere to be found):
"subscriptions" : {
"0" : {
"agency" : [ <agency-data> ]
}
}
Why is this the result of the $lookup, and how can I get the data structure I'm looking for here?
To clarify further, in the docs they mention using an $unwind BEFORE the $lookup when it's an array field. But in this case, the actual local field being targeted and replaced by the $lookup is NOT an array, but it is within an array. So I'm not clear on what the problem is.
You need to use $unwind to match your "localField" with to the "foreignField" and then $group to rollback again to the array
db.collection.aggregate([
{ "$unwind": "$subsciption" },
{ "$lookup": {
"from": Agency.collection.name,
"localField": "subsciption.agency",
"foreignField": "_id",
"as": "subsciption.agency"
}},
{ "$group": {
"_id": "$_id",
"memberType": { "$first": "$memberType" },
"subsciption": { "$push": "$subsciption" },
}}
])
Basically, what OP is looking for is to transform data in desired format after looking up into another collection. Assuming there are two collections C1 and C2 where C1 contains document
{ "_id" : ObjectId("5b50b8ebfd2b5637081105c6"), "subscriptions" : [ { "agency" : "3dg", "memberyType" : "primary" } ] }
and C2 contains
{ "_id" : ObjectId("5b50b984fd2b5637081105c8"), "agency" : "3dg", "name" : "ABC", "address" : "1 some street" }
if following query is executed against database
db.C1.aggregate([
{$unwind: "$subscriptions"},
{
$lookup: {
from: "C2",
localField: "subscriptions.agency",
foreignField: "agency",
as: "subscriptions.agency"
}
}
])
We get result
{
"_id": ObjectId("5b50b8ebfd2b5637081105c6"),
"subscriptions": {
"agency": [{
"_id": ObjectId("5b50b984fd2b5637081105c8"),
"agency": "3dg",
"name": "ABC",
"address": "1 some street"
}],
"memberyType": "primary"
}
}
which is pretty close to what OP is looking forward.
Note: there may be some edge cases but with minor tweaks, this solution should work

Aggregate Populate array of ids with Their Documents

I'm Strugling with some aggregation functions in mongodb.
I want to get books Documents in author's document that has just books ids as array of strings ids like this :
Author Document
{
"_id" : "10",
"full_name" : "Joi Dark",
"books" : ["100", "200", "351"],
}
And other documents (books) :
{
"_id" : "100",
"title" : "Node.js In Action",
"ISBN" : "121215151515154",
"date" : "2015-10-10"
}
So in result i want this :
{
"_id" : "10",
"full_name" : "Joi Dark",
"books" : [
{
"_id" : "100",
"title" : "Node.js In Action",
"ISBN" : "121215151515154",
"date" : "2015-10-10"
},
{
"_id" : "200",
"title" : "Book 2",
"ISBN" : "1212151454515154",
"date" : "2015-10-20"
},
{
"_id" : "351",
"title" : "Book 3",
"ISBN" : "1212151454515154",
"date" : "2015-11-20"
}
],
}
Use $lookup which retrieves data from the nominated collection in from based on the matching of the localField to the foreignField:
db.authors.aggregate([
{ "$lookup": {
"from": "$books",
"foreignField": "_id",
"localField": "books",
"as": "books"
}}
])
The as is where in the document to write an "array" containing the related documents. If you specify an existing property ( such as is done here ) then that property is overwritten with the new array content in output.
If you have a MongoDB before MongoDB 3.4 then you may need to $unwind the array of "books" as the localField first:
db.authors.aggregate([
{ "$unwind": "$books" },
{ "$lookup": {
"from": "$books",
"foreignField": "_id",
"localField": "books",
"as": "books"
}}
])
Which creates a new document for each array member in the original document, therefore use $unwind again and $group to create the original form:
db.authors.aggregate([
{ "$unwind": "$books" },
{ "$lookup": {
"from": "$books",
"foreignField": "_id",
"localField": "books",
"as": "books"
}},
{ "$unwind": "$books" },
{ "$group": {
"_id": "$_id",
"full_name": { "$first" "$full_name" },
"books": { "$push": "$books" }
}}
])
If in fact your _id values in the foreign collection of of ObjectId type, but you have values in the localField which are "string" versions of that, then you need to convert the data so the types match. There is no other way.
Run something like this through the shell to convert:
var ops = [];
db.authors.find().forEach(doc => {
doc.books = doc.books.map( book => new ObjectId(book.valueOf()) );
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": { "books": doc.books }
}
}
});
if ( ops.length >= 500 ) {
db.authors.bulkWrite(ops);
ops = [];
}
});
if ( ops.length > 0 ) {
db.authors.bulkWrite(ops);
ops = [];
}
That will convert all the values in the "books" array into real ObjectId values that can actually match in a $lookup operation.
Just adding on top of the previous answer. If your input consists of an array of strings and you want to convert them to ObjectIds, you can achieve this by using a projection, followed by a map and the $toObjectId method.
db.authors.aggregate([
{ $project: {
books: {
$map: {
input: '$books',
as: 'book',
in: { $toObjectId: '$$book' },
},
},
},},
{ $lookup: {
from: "$books",
foreignField: "_id",
localField: "books",
as: "books"
}
},
])
Ideally, your database would be formatted in such a manner that your aggregates are stored as ObjectIds, but in the case where that is not an option, this poses as a viable solution.

Export collection and replace field with field from another collection (aggregate?)

Using MongoChef GUI but fine in command line.
I have a collection with a structure as thus:
Votes
{
"_id" : "5qgfddRubJ32pS48B",
"createdBy" : "HdKRfwzGriMMZgSQu",
"fellowId" : "yCaqt5nT3LQCBLj8j",
}
I need to first look up the user in a users collection using the createdBy field to see if they are verified
Users
{
"_id": "HdKRfwzGriMMZgSQu",
"emails" : [
{
"address" : "someuser#example.com",
"verified" : true
}
]
}
and additionally, get some more information from a third collection from fellowId
Fellows
{
"_id" : "yCaqt5nT3LQCBLj8j",
"title" : "Fellow Title"
}
And have them all export as one csv or json file. How can I achieve this as a mongo query/export?
The desired output would be, for example:
{
"_id" : "yCaqt5nT3LQCBLj8j",
"fellowTitle": "Fellow Title"
"isVerified" : true
}
You can perform an aggregate with 2 $lookup to join both collections :
1 $lookup to join users
1 $unwind to remove users array
1 $unwind to remove user email array (as we have to check verify)
1 $sort to sort with user.emails.verified
1 $group to actually pick only the first entry (verified or not)
1 $lookup to join fellows
1 $unwind to remove fellows array
1 $project to format whatever format you want at the end
1 $out to export to a new collection
Query is :
db.votes.aggregate([{
$lookup: {
from: "users",
localField: "createdBy",
foreignField: "_id",
as: "user"
}
}, {
$unwind: "$user"
}, {
$unwind: "$user.emails"
}, {
$sort: { "user.emails.verified": -1 }
}, {
$group: {
_id: "$_id",
createdBy: { $first: "$createdBy" },
fellowId: { $first: "$fellowId" },
user: { $first: "$user" }
}
}, {
$lookup: {
from: "fellows",
localField: "fellowId",
foreignField: "_id",
as: "fellow"
}
}, {
$unwind: "$fellow"
}, {
$project: {
"_id": 1,
"fellowTitle": "$fellow._id",
"isVerified": "$user.emails.verified"
}
}, {
$out: "results"
}])
Then export with :
mongoexport - d testDB - c results > results.json