MongoDB aggregation to update more than one collection - mongodb

was wondering if it's possible to create such aggregation that could modify three collection in one go?
For example imagine that we have these collections:
MasterCollection:
{
_id: ObjectId;
}
CollectionOne:
{
_id: ObjectId;
_masterCollectionId: ObjectId;
_parentId: ObjectId;
_title: String;
_displayTitle: String;
}
CollectionTwo:
{
_id: ObjectId;
_masterCollectionId: ObjectId;
_parentId: ObjectId;
_displayTitle: String;
}
CollectionThree:
{
_id: ObjectId;
_masterCollectionId: ObjectId
_parentId: ObjectId;
_displayTitle: String;
}
Aggregation to lookup:
[
{
$match: {
_id: new ObjectId("618552b66f82e69572e9bf10")
}
},
{
$lookup: {
from: "CollectionOne",
let: {
"collectionOneId": "$_id"
},
pipeline: [{
"$match": {
"$expr": {
"$eq": ["$_parentId", "$$collectionOneId"]
}
}
}, {
"$lookup": {
"from": "CollectionTwo",
"let": {
"collectionTwoId": "$_id"
},
"pipeline": [{
"$match": {
"$expr": {
"$eq": ["$_parentId", "$$collectionTwoId"]
}
}
}, {
"$lookup": {
"from": "CollectionThree",
"let": {
"collectionThreeId": "$_id"
},
"pipeline": [{
"$match": {
"$expr": {
"$eq": ["$_parentId", "$$collectionThreeId"]
}
}
}],
"as": "CollectionThree"
}
}],
"as": "CollectionTwo"
}
}],
as: "CollectionOne"
}
}]
With aggregation above I can get all children of MasterCollection record.
In similar way I could get CollectionOne record children.
But I am thinking if there could be a way to apply changes within pipeline for each record.
For example imagine situation that there is a need to change _masterCollectionId for CollectionOne record.
If we change that property we need to update all children records (CollectionTwo / CollectionThree).
Would need to update targeted CollectionOne record.
Would need to update CollectionTwo records that points to updated CollectionOne record.
Would need to update CollectionThree records that points to updated records of CollectionTwo.
Currently, programmatically I can do a loop and update everything, it kinda works. But I think there should be better solution, probably at database layer, using MongoDB aggregations.
Can MongoDB Aggregation achieve desired result ?
Any help would be appreciated.

Unfortunately I think what you're after is impossible using aggregation alone like you've explained.
The MongoDB documentation states that you can pass an aggregation pipeline into an updateMany() call where you can perform operations on the collection within the aggregation pipeline in MongoDB 4.2 or newer.
I haven't actually tried this technique personally, but it sounds like it might do the job. Based on the documentation, you'd need to run the updateMany() on each collection, which is still a huge improvement on what you've got already. Check it out here.
Something like this:
await CollectionOne.updateMany(
{ _masterClientId: <oldID> },
[
{ $set: { "_masterCollectionId": <newID> } }
]
);
// And repeat for CollectionTwo and CollectionThree

Related

MongoDB aggregate - return as separate objects

So I have 2 collections (tenants and campaigns) and I'm trying to compose a query to return 1 tenant and 1 campaign. As an input, there is a tenant domain and campaign slug. Since I first need the tenant _id to query the campaign (based on both tenantId and slug), aggregation seems more performative option (than making 2 consecutive queries).
Technically speaking, I know how to do that:
[{
$match: { 'domains.name': '<DOMAIN_HERE>' },
}, {
$lookup: {
from: 'campaigns',
localField: '_id',
foreignField: 'tenantId',
as: 'campaign',
pipeline: [{
$match: { slug: '<SLUG_HERE>' },
}],
},
}]
which returns:
{
_id: ObjectId('...'),
campaign: [{
_id: ObjectId('...'),
}],
}
But it feels very uncomfortable, because for one the campaign is returned as a field of tenant and for other the campaign is returned as a single item in an array. I know, I can process and better format the result programmatically afterwards. But is there any way to „hack“ the aggregation to achieve a result that looks more like this?
{
tenant: {
_id: ObjectId('...'),
},
campaign: {
_id: ObjectId('...'),
},
}
This is just a simplified example, in reality this aggregation query is a bit more complicated (across more collections, upon few of which I need to perform a very similar query), so it's not just about this one simple query. So the ability to return an aggregated document as a separate object, rather than an array field on parent document would be quite helpful - if not, the world won't fall apart :)
To all those whom it may concern...
Thanks to answers from some good samaritans here, I've figured it out as a combination of $addFields, $project and $unwind. Extending my original aggregation query, the final pipeline would look like this:
[{
$match: { 'domains.name': '<DOMAIN_HERE>' },
}, {
$addFields: { tenant: '$$ROOT' },
}, {
$project: { _id: 0, tenant: 1 },
}, {
$lookup: {
from: 'campaigns',
localField: 'tenant._id',
foreignField: 'tenantId',
as: 'campaign',
pipeline: [{
$match: { slug: '<SLUG_HERE>' },
}],
},
}, {
$unwind: {
path: '$campaign',
preserveNullAndEmptyArrays: true,
},
}]
Thanks for the help! 😊

How to create a mongo view providing data only for the latest as at date

In MSSQL the view can be easily created by running 2 queries:
The first one gets the latest date available
the second one extracts all records as at that date
How can I achieve the same results in Mongo? Mongo collection is Indexed by AsAtDate and Id.
Based on research, mongo only supports views created with a pipeline that doesn't support multiple queries.
Example:
db.getCollection('MyCollection').aggregate([
{ "$lookup":
{
"from": "CollectionAsAtDate",
"pipeline": [
{ "$sort": { "AsAtDate": -1 } },
{ "$limit": 1 },
{ "$project": { "_id": 0, "AsAtDate": 1 } }
],
"as": "latest"
}
},
{"$unwind":"$latest"},
{
"$match": {
$expr: {
{ $eq: [AsAtDate,"$latest.AsAtDate" ]}
}
}
}
])
This piece of code is the best solution, I have found so far but it is still not efficient.
It takes me 10+ seconds to get data from a collection by already indexed parameters!
The reason for that is just because here we are trying to find last date and attach it to all documents. After that, we are checking if AsAtDate == LastAsAtDate and remove if not. It works okay but it requires more resources rather than a simple search in an index!
How about moving the "sort-limit1" operation out from the sub-pipeline first? In this sense, you can ensure the index is leveraged and you are processing with only 1 document(i.e. the one with latest AsAtDate). You can avoid performing n-to-n $lookup with a 1-to-n $lookup.
db.collection.aggregate([
{
"$sort": {
"AsAtDate": -1
}
},
{
"$limit": 1
},
{
"$project": {
"_id": 0,
"AsAtDate": 1
}
},
{
"$lookup": {
"from": "collection",
"localField": "AsAtDate",
"foreignField": "AsAtDate",
"as": "latest"
}
},
{
"$unwind": "$latest"
},
{
"$replaceRoot": {
"newRoot": "$latest"
}
}
])
Here is the Mongo playground for your reference.

MongoDB aggregate different collections

i've been trying to do this aggregation since yesterday with no luck, hope you guys can give me some ideas. A little background, I have 2 collections, one for results and another for questions. Results is basically where people solve questions, so it can have 1 question or up to 99 if i'm not mistaken. This is the simplified schema:
Results Schema:
_id: ObjectId("6010664ac5c4f77f26f5d005")
questions: [
{
_id: ObjectId("5d2627c94bb703bfcc910763"),
correct: false
},
{
_id: ObjectId("5d2627c94bb703bfcc910764"),
correct: true
},
{
_id: ObjectId("5d2627c94bb703bfcc910765"),
correct: true
}
]
So, on this specific object, the user answered 3 questions and got 2 of them correct.
Questions Schema:
_id: ObjectId("5d2627c94bb703bfcc910763")
What i'm struggling to do is: for each element in all the questions schema, I have to check if the question was answered - i.e (check if there's an _id of questions array == _id on the Questions Schema, yes there can be multiple questions with the same _id as Questions Schema, but Questions Schema _id is unique). If that question was answered, I need to check if correct = true, if so, I add it to a correctAnswer variable, if not, wrongAnswer.
I've tried many things so far with conditions, group to get the $sum of correct and wrong answers, lookup to join both collections but so far, I can't even get to show just the aggregation result.
One of the things I tried (i was trying to do baby steps first) but as mentioned before, couldn't even get the result printed.
Result.aggregate([
$lookup: {
from: 'question',
localField: 'questions._id',
foreignField: '_id',
as: 'same'
},
But this gets me both collections combined, and 'same' comes as empty array, tried using match with also no luck.
I also did a $project to just get the information I wanted
$project: {
_id: 0,
questions: {
_id: 1,
correct: 1
}
},
Tried using $group:
$group: {
_id: "$_id",
$cond: {if: {"$correct": { $eq: true}}, then: {testField: {$sum: 1}}, else: {testField: 0}}
}
And as I said, i was just trying to do baby steps so it the testField was beeing manually set, also tried many other things from stackoverflow.
Would appreciate the help and sorry for the very long text, just wanted to put in some examples that I did and tried.
TLDR: Need to find a question from the Results Schema where _id matches an _id from the Questions Schema, if there is, check if correct: true or correct: false. Update Questions Schema accordingly with how many were correct and how many were wrong for each question from Questions Schema.
Example: newField: {correctAnswer: 4, wrongAnswer: 3} so in this case, there were 7 questions from the Result schema question array that matched an _id from Question Schema, 4 had correct: true and 3 had correct: false. Then it goes on like this for the rest of Question Schema
For a scenario where "$lookup" can't be used because the Question collection is in a different database, the Result collection may be used to generate output documents to update the Question collection.
Here's one way to do it.
db.Result.aggregate([
{
"$unwind": "$questions"
},
{
"$group": {
"_id": "$questions._id",
"correct": {
"$push": "$questions.correct"
}
}
},
{
"$project": {
"newField": {
"correctAnswers": {
"$size": {
"$filter": {
"input": "$correct",
"as": "bool",
"cond": "$$bool"
}
}
},
"wrongAnswers": {
"$size": {
"$filter": {
"input": "$correct",
"as": "bool",
"cond": { "$not": "$$bool" }
}
}
}
}
}
}
])
Try it on mongoplayground.net.
I don't know of a way to "$lookup" and update at the same time. There's probably a better way to do this, but the aggregation pipeline below creates the documents that could be used in a subsequent update. The pipeline correctly counts repeat questions by a single Result _id, in case someone keeps trying a question until they get it right. One possible issue is that if a question has no Result answers, then no "newField": { "correctAnswers": 0, "wrongAnswers": 0 } document is created.
db.Question.aggregate([
{
// lookup documents in Result that match _id
"$lookup": {
"from": "Result",
"localField": "_id",
"foreignField": "questions._id",
"as": "results"
}
},
{
// unwind everything
"$unwind": "$results"
},
{
// more of everything
"$unwind": "$results.questions"
},
{
// only keep answers that match question
"$match": {
"$expr": { "$eq": [ "$_id", "$results.questions._id" ] }
}
},
{
// reassemble and count correct/wrong answers
"$group": {
"_id": "$_id",
"correct": {
"$sum": {
"$cond": [ { "$eq": [ "$results.questions.correct", true ] }, 1, 0 ]
}
},
"wrong": {
"$sum": {
"$cond": [ { "$eq": [ "$results.questions.correct", false ] }, 1, 0 ]
}
}
}
},
{
// project what you want as output
"$project": {
newField: {
correctAnswers: "$correct",
wrongAnswers: "$wrong"
}
}
}
])
Try it on mongoplayground.net.

What's the best way to manage large ObjectID arrays in mongoose / mongo

In this case:
const PostSchema = new mongoose.Schema({
"content": {
type: String,
required: true
},
"user": {
type: mongoose.Schema.Types.ObjectId,
required: true,
ref: "User"
},
"created": {
type: Date,
default: Date.now()
},
"comments": [{
type: mongoose.Schema.Types.ObjectID,
ref: 'Comment'
}]
})
I want to be able to get 10 comments at a time, but I see no way to do that without having to get all the comments every time.
You can use uncorrelated lookup to join collections and limit to 10. Here is an example, I used String for _id for easy understanding.
$lookup - there are two lookup, I used here uncorrelated lookup where you can do parallel aggregation in joining collection. $match helps to conditionally join documents. $expr is a must to use inside the $match when you use uncorrelated lookup. $limit helps to limit the documents. If you need you can add more stages to perform aggregation inside the pipeline
Here is the script
db.PostSchema.aggregate([
{
"$lookup": {
"from": "Comment",
let: {
cId: "$comments"
},
"pipeline": [
{
$match: {
$expr: {
_id: {
in: [
"$$cId"
]
}
}
}
},
{
$limit: 10
}
],
"as": "comments"
}
}
])
Working Mongo playground

How to $lookup/populate an embedded document that is inside an array?

How to $lookup/populate an embedded document that is inside an array?
Below is how my schema is looking like.
const CommentSchema = new mongoose.Schema({
commentText:{
type:String,
required: true
},
arrayOfReplies: [{
replyText:{
type:String,
required: true
},
replier: [{
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true,
}],
}],
});
How can I get query results that look like below:
[
{
commentText: 'comment text',
arrayOfReplies: [
{
replyText: 'replyText',
replier: {
username:"username"
bio: 'bio'
}
}
]
}
]
I am trying to populate the replier field inside the array arrayOfReplies. I have tried several variations of the aggregation query below. The ones that have come close to what I am trying to achieve have one short-coming. The comments that do not have replies have an arrayOfReplies array that has an empty object. I.e arrayOfReplies: [{}], essentially meaning that the array is not empty.
I have tried using add fields, $mergeObjects among other pipeline operators but to no avail.
How to $lookup/populate the replier document that is inside the arrayOfReplies array?
Below is a template of the main part of my aggregation query, minus trying populate the replier document.
Comment.aggregate([
{$unwind: {"path": '$arrayOfReplies', "preserveNullAndEmptyArrays": true }},
{$lookup:{from:"users",localField:"$arrayOfReplies.replier",foreignField:"_id",as:"replier"}},
{$unwind: {"path": "$replier", "preserveNullAndEmptyArrays": true }},
{$group: {
_id : '$_id',
commentText:{$first: '$commentText'},
userWhoPostedThisComment:{$first: '$userWhoPostedThisComment'},
arrayOfReplies: {$push: '$arrayOfReplies' },
}},
After your lookup stage, each document will have
{
commentText: "text",
arrayOfReplies: <single reply, with replier ID>
replier: [<looked up replier data>]
}
Use an $addFields stage to move that replier data inside the reply object before the group, like:
{$addFields: {"arrayOfReplies.replier":"$replier"}}
Then your group stage will rebuild arrayOfReplies like you want.
You can use the following aggregate:
Playground
Comment.aggregate([
{
$unwind: {
"path": "$arrayOfReplies",
"preserveNullAndEmptyArrays": true
}
},
{
$lookup: {
from: "users",
localField: "arrayOfReplies.replier",
foreignField: "_id",
as: "replier"
}
},
{
$addFields: {
"arrayOfReplies.replier": {
$arrayElemAt: [
"$replier",
0
]
}
}
},
{
$project: {
"replier": 0
}
},
{
$group: {
_id: "$_id",
"arrayOfReplies": {
"$push": "$arrayOfReplies"
},
commentText: {
"$first": "$commentText"
}
}
}
]);
All the answers provided did not solve this issue as stated in the question.
I am trying to populate the replier field inside the array
arrayOfReplies. I have tried several variations of the aggregation
query below. The ones that have come close to what I am trying to
achieve have one short-coming. The comments that do not have replies
have an arrayOfReplies array that has an empty object. I.e
arrayOfReplies: [{}], essentially meaning that the array is not empty.
I wanted an aggregation that returns an empty array (not an array with an empty object) when the array is empty.
I was able to achieve what I wanted by using the code below:
arrayOfReplies:
{$cond:{
if: { $eq: ['$arrayOfReplies', {} ] },
then: "$$REMOVE",
else: {
_id : '$arrayOfReplies._id',
replyText:'$arrayOfReplies.replyText',
}
}}
If you combine the code above with #SuleymanSah's answer you get the full working code.