I'm trying an aggregation but I can't find the right pipeline to do it.
So, this is a part of my document model :
//company.js
{
"_id" : "5dg8aa8c435b1e2868c841f6",
"name" : "My Corp",
"externalId" : "d7f348c9-c69b-69c4-923c-91458c53dc22",
"professionals_customers" : [
{
"company" : "6f4d01eb3b948150c2aad9c0"
},
{
"company" : "5dg7aa8c366b1e2868c841f6",
"contact" : "5df8ab5c355b1e2999c841f7"
}
],
}
I try to return the professionnal customers fields hydrated with data, like a classic populate would do.
Company field came from the company collection and contact is provided by the user collection
The desired output must look like :
{
"professionals_customers" : [
{
"company": {
"_id": "6f4d01eb3b948150c2aad9c0",
"name": "Transtar",
"externalId": "d7f386c9-c79b-49c5-905c-90750c42dc22",
},
},
{
"company": {
"_id": "5dg7aa8c366b1e2868c841f6",
"name": "Aperture",
"externalId": "d7f386c9-c69b-49c4-905c-90750c53dc22",
},
"contact" : {
"_id": "5df8ab5c355b1e2999c841f7",
"firstname": "Caroline",
"lastname": "Glados",
"externalId": "d7f386c9-c69b-49c4-905c-90750c53dc22", //same externalId as above, the user belongs to the company
},
}
]
}
At this point I've tried multiple solutions but I can't reach my goal.
let query = [{
$match : { _id : companyId }
},{
$lookup : {
from: 'companies',
localField : 'professionals_customers.company',
foreignField : '_id',
as : 'professionalsCustomers'
}
},{
$lookup : {
from: 'users',
localField : 'professionals_customers.contact',
foreignField : '_id',
as : 'contacts'
}
}]
At this, point I' ve got two new arrays with all the needed informations, but I don't know how to get the right contact grouped with the right company. Also, maybe it's easier to try to populate the data (with $lookup) keeping the initial struct than trying to regroup professionalCustomers and contacts through the shared externalId.
Additional informations :
-An user that belongs to a company has the same externalId.
-I don't want to use a classical populate, after that, I need to do some other operations
Try this query :
db.companies.aggregate([
{ $match: { _id: companyId } },
{ $unwind: "$professionals_customers" },
{
$lookup: {
from: "companies",
localField: "professionals_customers.company",
foreignField: "_id",
as: "professionals_customers.company"
}
},
{
$lookup: {
from: "users",
localField: "professionals_customers.contact",
foreignField: "_id",
as: "professionals_customers.contact"
}
},
{
$addFields: {
"professionals_customers.company": {
$arrayElemAt: ["$professionals_customers.company", 0]
},
"professionals_customers.contact": {
$arrayElemAt: ["$professionals_customers.contact", 0]
}
}
},
{
$group: { _id: "$_id", professionals_customers: { $push: "$professionals_customers" }, data: { $first: "$$ROOT" } }
},
{ $addFields: { "data.professionals_customers": "$professionals_customers" } },
{ $replaceRoot: { newRoot: "$data" } }
])
Test : MongoDB-Playground
Note : If needed you need to convert fields/input which is of type string to ObjectId(). Basic thing is you need to check types of two fields being compared or input-to-field-in-DB matches or not.
Related
I have following collections in MongoDB
Profile Collection
> db.Profile.find()
{ "_id" : ObjectId("5ec62ccb8897af3841a46d46"), "u" : "Test User", "is_del": false }
Store Collection
> db.Store.find()
{ "_id" : ObjectId("5eaa939aa709c30ff4703ffd"), "id" : "5ec62ccb8897af3841a46d46", "a" : { "ci": "Test City", "st": "Test State" }, "ip" : false }, "op" : [ ], "b" : [ "normal" ], "is_del": false}
Item Collection
> db.Item.find()
{ "_id" : ObjectId("5ea98a25f1246b53a46b9e10"), "sid" : "5eaa939aa709c30ff4703ffd", "n" : "sample", "is_del": false}
Relation among these collections are defined as follows:
Profile -> Store: It is 1:n relation. id field in Store relates with _id field in Profile.
Store -> Item: It is also 1:n relation. sid field in Item relates with _id field in Store.
Now, I need to write a query to find the all the store of profiles alongwith their count of Item for each store. Document with is_del as true must be excluded.
I am trying it following way:
Query 1 to find the count of item for each store.
Query 2 to find the store for each profile.
Then in the application logic use both the result to produce the combined output.
I have query 1 as follows:
db.Item.aggregate({$group: {_id: "$sid", count:{$sum:1}}})
Query 2 is as follows:
db.Profile.aggregate([{ "$addFields": { "pid": { "$toString": "$_id" }}}, { "$lookup": {"from": "Store","localField": "pid","foreignField": "id", "as": "stores"}}])
In the query, is_del is also missing. Is there any simpler way to perform all these in a single query? If so, what will be scalability impact?
You can use uncorrelated sub-queries, available from MongoDB v3.6
db.Profile.aggregate([
{
$match: { is_del: false }
},
{
$lookup: {
from: "Store",
as: "stores",
let: {
pid: { $toString: "$_id" }
},
pipeline: [
{
$match: {
is_del: false,
$expr: { $eq: ["$$pid", "$id"] }
}
},
{
$lookup: {
from: "Item",
as: "items",
let: {
sid: { $toString: "$_id" }
},
pipeline: [
{
$match: {
is_del: false,
$expr: { $eq: ["$$sid", "$sid"] }
}
},
{
$count: "count"
}
]
}
},
{
$unwind: "$items"
}
]
}
}
])
Mongo Playground
To improve performance, I suggest you store the reference ids as ObjectId so you don't have to convert them in each step.
I have a MongoDB database that is populated by a Spring application using Spring Data. I want to perform a manual query to join two collections and extract some statistics from this data.
The first collection is named emailCampaign and contains this information (simplified):
{
"_id" : ObjectId("5db85687307b0a0d184448db"),
"name" : "Welcome email",
"subject" : "¡Welcome {{ user.name }}!",
"status" : "Sent",
"_class" : "com.mycompany.EmailCampaign"
}
The second collection is named campaignDelivery and contains this information (simplified):
/* 1 */
{
"_id" : ObjectId("5db183fb307b0aef3113361f"),
"campaign" : {
"$ref" : "emailCampaign",
"$id" : ObjectId("5db85687307b0a0d184448db")
},
"deliveries" : 3,
"_class" : "com.mycompany.CampaignDelivery"
}
/* 2 */
{
"_id" : ObjectId("5db85f2c307b0a0d184448e1"),
"campaign" : {
"$ref" : "emailCampaign",
"$id" : ObjectId("5db85687307b0a0d184448db")
},
"deliveries" : 5,
"_class" : "com.mycompany.CampaignDelivery"
}
Ultimately I want to obtain the sum of both deliveries field, but by now I'm stuck with the basic JOIN:
db.emailCampaign.aggregate([
{
$lookup: {
from: 'campaignDelivery',
localField: '_id',
foreignField: 'campaign.$id',
as: 'deliveries'
}
}
])
Throws the following error:
FieldPath field names may not start with '$'.
Escaping the dollar had no impact whatsoever, and I can't any examples of fields that start with dollars.
You can workaround it by using uncorrelated $lookup with $objectToArray in the sub-query to access campaign.$id:
db.emailCampaign.aggregate([
{ $lookup: {
from: "campaignDelivery",
let: { id: "$_id" },
pipeline: [
{ $addFields: {
refId: { $arrayElemAt: [
{ $filter: {
input: { $objectToArray: "$campaign" },
cond: { $eq: [ "$$this.k", { $literal: "$id" } ] }
} }
, 0
] }
} },
{ $match: {
$expr: { $eq: [
"$refId.v",
"$$id"
] }
} },
{ $project: {
refId: 0
} }
],
as: "deliveries"
} }
])
We have a DB structure similar to the following:
Pet owners:
/* 1 */
{
"_id" : ObjectId("5baa8b8ce70dcbe59d7f1a32"),
"name" : "bob"
}
/* 2 */
{
"_id" : ObjectId("5baa8b8ee70dcbe59d7f1a33"),
"name" : "mary"
}
Pets:
/* 1 */
{
"_id" : ObjectId("5baa8b4fe70dcbe59d7f1a2a"),
"name" : "max",
"owner" : ObjectId("5baa8b8ce70dcbe59d7f1a32")
}
/* 2 */
{
"_id" : ObjectId("5baa8b52e70dcbe59d7f1a2b"),
"name" : "charlie",
"owner" : ObjectId("5baa8b8ce70dcbe59d7f1a32")
}
/* 3 */
{
"_id" : ObjectId("5baa8b53e70dcbe59d7f1a2c"),
"name" : "buddy",
"owner" : ObjectId("5baa8b8ee70dcbe59d7f1a33")
}
I need a list of all pet owners and additionally the number of pets they own. Our current query looks similar to the following:
db.getCollection('owners').aggregate([
{ $lookup: { from: 'pets', localField: '_id', foreignField: 'owner', as: 'pets' } },
{ $project: { '_id': 1, name: 1, numPets: { $size: '$pets' } } }
]);
This works, however it's quite slow and I'm asking myself if there's a more efficient way to perform the query?
[update and feedback] Thanks for the answers. The solutions work, however I can unfortunately see no performance improvement compared to the query given above. Obviously, MongoDB still needs to scan the entire pet collection. My hope was, that the owner index (which is present) on the pets collection could somehow be exploited for getting just the counts (not needing to touch the pet documents), but this does not seem to be the case.
Are there any other ideas or solutions for a very fast retrieval of the 'pet count' beside explicitly storing the count within the owner documents?
In MongoDB 3.6 you can create custom $lookup pipeline and count instead of entire pets documents, try:
db.owners.aggregate([
{
$lookup: {
from: "pets",
let: { ownerId: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: [ "$$ownerId", "$owner" ] } } },
{ $count: "count" }
],
as: "numPets"
}
},
{
$unwind: "$numPets"
}
])
You can try below aggregation
db.owners.aggregate([
{ "$lookup": {
"from": "pets",
"let": { "ownerId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$$ownerId", "$owner" ] }}},
{ "$count": "count" }
],
"as": "numPets"
}},
{ "$project": {
"_id": 1,
"name": 1,
"numPets": { "$ifNull": [{ "$arrayElemAt": ["$numPets.count", 0] }, 0]}
}}
])
I am puzzled as to why the code below doesn't work. Can anyone explain, please?
For some context: My goal is to get the score associated with an answer option for a survey database where answers are stored in a separate collection from the questions. The questions collection contains an array of answer options, and these answer options have a score.
Running this query:
db.answers.aggregate([
{
$match: {
userId: "abc",
questionId: ObjectId("598be01d4efd70a81c1c5ad4")
}
},
{
$lookup: {
from: "questions",
localField: "questionId",
foreignField: "_id",
as: "question"
}
},
{
$unwind: "$question"
},
{
$unwind: "$question.options"
},
{
$unwind: "$answers"
}
])
I get:
{
"_id" : ObjectId("598e588e0c5e24452c9ee769"),
"userId" : "abc",
"questionId" : ObjectId("598be01d4efd70a81c1c5ad4"),
"answers" : {
"id" : 20
},
"question" : {
"_id" : ObjectId("598be01d4efd70a81c1c5ad4"),
"options" : {
"id" : 10,
"score" : "12"
}
}
}
{
"_id" : ObjectId("598e588e0c5e24452c9ee769"),
"userId" : "abc",
"questionId" : ObjectId("598be01d4efd70a81c1c5ad4"),
"answers" : {
"id" : 20
},
"question" : {
"_id" : ObjectId("598be01d4efd70a81c1c5ad4"),
"options" : {
"id" : 20,
"score" : "4"
}
}
}
All great. If I now add to the original query a match that's supposed to find the answer option having the same id as the answer (e.g. questions.options.id == answers.id), things don't work as I would expect.
The final pipeline is:
db.answers.aggregate([
{
$match: {
userId: "abc",
questionId: ObjectId("598be01d4efd70a81c1c5ad4")
}
},
{
$lookup: {
from: "questions",
localField: "questionId",
foreignField: "_id",
as: "question"
}
},
{
$unwind: "$question"
},
{
$unwind: "$question.options"
},
{
$unwind: "$answers"
},
{
$match: {
"question.options.id": "$answers.id"
}
},
{
$project: {
_id: 0,
score: "$question.options.score"
}
}
])
This returns an empty result. But if I change the RHS of the $match from "$answers.id" to 20, it returns the expected score: 4. I tried everything I could think of, but couldn't get it to work and can't understand why it doesn't work.
I was able to get it to work with the following pipeline:
{
$match: {
userId: "abc",
questionId: ObjectId("598be01d4efd70a81c1c5ad4")
}
},
{
$lookup: {
from: "questions",
localField: "questionId",
foreignField: "_id",
as: "question"
}
},
{
$unwind: "$question"
},
{
$unwind: "$question.options"
},
{
$unwind: "$answers"
},
{
$addFields: {
areEqual: { $eq: [ "$question.options.id", "$answers.id" ] }
}
},
{
$match: {
areEqual: true
}
},
{
$project: {
_id: 0,
score: "$question.options.score"
}
}
I think the reason it didn't work with a direct match is the fact that questions.options.id doesn't actually reference the intended field... I needed to use $questions.options.id which wouldn't work as a LHS of a $match, hence the need to add an extra helper attribute.
Using MongoChef GUI but fine in command line.
I have a collection with a structure as thus:
Votes
{
"_id" : "5qgfddRubJ32pS48B",
"createdBy" : "HdKRfwzGriMMZgSQu",
"fellowId" : "yCaqt5nT3LQCBLj8j",
}
I need to first look up the user in a users collection using the createdBy field to see if they are verified
Users
{
"_id": "HdKRfwzGriMMZgSQu",
"emails" : [
{
"address" : "someuser#example.com",
"verified" : true
}
]
}
and additionally, get some more information from a third collection from fellowId
Fellows
{
"_id" : "yCaqt5nT3LQCBLj8j",
"title" : "Fellow Title"
}
And have them all export as one csv or json file. How can I achieve this as a mongo query/export?
The desired output would be, for example:
{
"_id" : "yCaqt5nT3LQCBLj8j",
"fellowTitle": "Fellow Title"
"isVerified" : true
}
You can perform an aggregate with 2 $lookup to join both collections :
1 $lookup to join users
1 $unwind to remove users array
1 $unwind to remove user email array (as we have to check verify)
1 $sort to sort with user.emails.verified
1 $group to actually pick only the first entry (verified or not)
1 $lookup to join fellows
1 $unwind to remove fellows array
1 $project to format whatever format you want at the end
1 $out to export to a new collection
Query is :
db.votes.aggregate([{
$lookup: {
from: "users",
localField: "createdBy",
foreignField: "_id",
as: "user"
}
}, {
$unwind: "$user"
}, {
$unwind: "$user.emails"
}, {
$sort: { "user.emails.verified": -1 }
}, {
$group: {
_id: "$_id",
createdBy: { $first: "$createdBy" },
fellowId: { $first: "$fellowId" },
user: { $first: "$user" }
}
}, {
$lookup: {
from: "fellows",
localField: "fellowId",
foreignField: "_id",
as: "fellow"
}
}, {
$unwind: "$fellow"
}, {
$project: {
"_id": 1,
"fellowTitle": "$fellow._id",
"isVerified": "$user.emails.verified"
}
}, {
$out: "results"
}])
Then export with :
mongoexport - d testDB - c results > results.json