Combined lookup and embed in the same MongoDB collection depending on root document structure - mongodb

I want to make a conditional lookup with Mongo in the following way: a root document contains either a link (customerId) to the customers collection or directly embeds a customer, like this:
{
"_id" : 1,
"item": "item1",
"customer": { _id: 1, "name": "Jane Doe" }
},
{
"_id": 2,
"item": "item2",
"customerId": 1
}
customers collection:
{ _id: 1, "name": "Jane Johnson" }
The customers collection stores the current versions of customers; to maintain consistency, members of the items collection will contain just ids of customer. But if I want to freeze an item so it holds the version of its customer at a certain time, I will embed that customer directly into the item in question.
When searching for items I want them to appear uniformly (i.e. regardless whether customer is looked up or embedded it will appear as embedded field):
e.g.
[{
"_id" : 1,
"item": "item1",
"customer": { _id: 1, "name": "Jane Doe" } // historical version of Jane (embedded)
},
{
"_id": 2,
"item": "item2",
"customer": { _id: 1, "name": "Jane Johnson" } // current version of Jane by lookup
}]
Question 1: is this the right approach and if not what is the best practice for handling cases like this?
Question 2: if my approach is correct, how to best use aggregation framework to achieve this?
Thanks!

The answers:
Yes, you can achieve it with aggregation framework and it is one of the possible solutions (the other possible solution would be to implement it in your program).
Just use $lookup (for data merging from other collection) and $project (for final result generation) pipeline stages.
Query example:
db.getCollection('items').aggregate([
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerData"
}
},
{
$project: {
"item": 1,
"customer": {
$cond: {
if: {$gt: [{$size: "$customerData"}, 0]},
then: {$arrayElemAt: ["$customerData", 0]},
else: "$customer"
}
}
}
}
]);

Related

Remove documents not linked by DBRef mongodb

Hello everyone I try to remove all documents which are no link by an DBRef field.
For example I have 2 collections User and Order.
Let's say User collection has this structure :
{
"_id": "12345",
"lastName": "Michael",
"firstName": "Bernard"
}
And Order collection has this structure :
{
"_id": "123456",
"orderDate": "2022-12-26",
"userWhoOrder": {
"$ref": "user",
"$id": "12345"
},
"userWhoPay": {
"$ref": "user",
"$id": "123456"
}
}
Note : userWhoOrder and userWhoPay are DBRef fields (and it can be the same user).
So I want remove all users who are not present in order collection (neither userWhoOrder neither userWhoPay).
I know I can do this in 2 steps :
Get a list of userWhoOrder and userWhoPay from order collection.
Filter users list which not contains users of step 1 and remove them.
But I want to know if there are a properly way to do this with a single request (using $lookup for example).
Here is what I tried for getting a users list to remove :
db.getCollection("user").aggregate[{$lookup: {
from: "order",
let: {userId: "$_id"},
pipeline: [
{$match: {$expr: {
$and: [
{$ne: ["$userWhoOrder.$id", "$$userId"]},
{$ne: ["$userWhoPay.$id", "$$userId"]}
]
}}}
],
as: "result"
}]

multi-stage aggregation pipeline matching data based on fields retrieved through $lookup

I'm trying to build a complex, nested aggregation pipeline in MongoDB (4.4.9 Community Edition, using the pymongo driver for Python 3.10).
There are relevant data points in different collections which I want to aggregate into one, NEW (ideally) view (or, if that doesn't work) collection.
The collections, and the relevant fields therein follow a hierarchy. There is members, which contains the top-level key on which other data is to be merged,
membershipNumber.
> members.find_one()
{'_id': ObjectId('61153299af6122XXXXXXXXXXXXX'), 'membershipNumber': 'N03XXXXXX'}
Then, there's a different collection, which contains membershipNumber, but also a different, linked field, an_user_id. an_user_id is used in other collections to denote records/fields in arrays that pertain to that particular user.
I 'join' members and an_users like so:
result = members.aggregate([
{
'$lookup': {
'from': 'an_users',
'localField': 'membershipNumber',
'foreignField': 'memref',
'as': 'an_users'
}
},
{ '$unwind' : '$an_users' },
{
'$project' : {
'_id' : 1,
'membershipNumber' : 1,
'an_user_id' : '$an_users.user_id'
}
}
]);
So far so good, this returns the desired, aggregated record:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX'}
Now, I have a third collection, which contains the an_user_id as a string in arrays, denoting wherever that user clicked a given email, whereby a record is an email (and the an_user_ids in the clicks array are users that clicked a link in that email.
{'_id': ObjectId('blah'),
'email_id': '407XXX',
'actions_count': 17,
'administrative_title': 'test',
'bounce': ['3440XXXX'],
'click': ['38294CCC',
'418FFFF',
'48XXXXXX',
'38eGGGG'}
I want to count the number occurences of a given an_user_id (which I've attained from aggregating) in arrays (e.g. clicks, bounces, opens) in the emails collection, and include it in the .aggregate call, to retrieve something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12
}
Further, I might want to also attach counts of an_user_id in other collections in my DB.
Consider, e.g., this collection called events:
{
"_id": "617ffa96ee11844e143a63dd",
"id": "12345",
"administrative_title": "my_event",
"created_at": {
"$date": "2020-01-15T16:28:50.000Z"
},
"event_creator_id": "123456",
"event_title": "my_event",
"group_id": "123456",
"permalink": "event_id",
"rsvp_count": 54,
"rsvps": [{
"rsvp_id": "56789",
"display_name": "John Doe",
"rsvp_user_id": "48XXXXXX",
"rsvp_created_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"rsvp_updated_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"first_name": "John",
"last_name": "Doe",
}, {
"rsvp_id": "543895",
"display_name": "James Appleslice",
"rsvp_user_id": "N03XXXXXX",
"rsvp_created_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"rsvp_updated_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"first_name": "James",
"last_name": "Appleslice"}
]
}
So, the end-product would look something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12,
'n_rsvps' : 12
}
My idea was to use the $lookup parameter -- however, I only know how to use this for matching on fields that I have in the parent collection that I'm performing the aggregation on, but not on fields that have been generated in the process of the aggregation.
Any help would be hugely appreciated!
You could use $lookup pipeline. First you would $lookup the user id followed by another $lookup to verify if the user id exists in email. Lastly few more stages to collect the results and format per your need. Furthermore, you can add $out stage if you would like to write the results into another collection.
db.members.aggregate([{
$lookup: {
from: "an_users",
let: {
membershipNumber: "$membershipNumber"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$memref",
"$$membershipNumber"
]
},
}
},
{
"$lookup": {
"from": "emails",
"localField": "user_id",
"foreignField": "click",
"as": "clicks"
}
},
{
"$project": {
"_id": 1,
"membershipNumber": 1,
"an_user_id": "$user_id",
"n_email_clicks": {
$size: "$clicks"
}
}
}
],
as: "details"
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
{
$arrayElemAt: [
"$details",
0
]
},
"$$ROOT"
]
}
}
},
{
$project: {
details: 0
}
}])
Working example - https://mongoplayground.net/p/yrFsNp44hpi

How to find documents according to a common field value from another collection in mongodb

Assume I have 2 collections:
student:
{name: Joe, school: A}
{name: Kelly, school: B}
{name: Mike, school: C}
{name: Tom, school: D}
schoolRank: (all the school rank is stored in one document)
{rank: [{school: A, value: 1},{school: B, value: 2},{school: C, value: 3},{school: D, value: 4}]}
Now, my question is how could I find the student whoes school rank is higher than 3. (I am a newbie to mongodb. It seems like I need to use lookup but I am not sure how to do it exactly.) Thank you in advance!
You need to use $lookup. Is like a "join" in SQL.
But, first of all. Your document could be much better. schoolRank collection could have every school in a document instead of a unique array wit all values.
Check here the difference between the query with your schema and the schema with schoolRank splited into diffretend documents.
The second query return only the document where field school match. The other will return the entire array for each document, because in each document exist a field school that also exists into rank array.
So, with your schema you need extra stages. Maybe there is another way more efficent, but I'm not used to do $lookup with a bad schema (sorry).
I've try this query:
First $lookup to join both collections (as I've said before, the join is basically add the entire array into each document).
Then an extra stage to get the value returned from $lookup using $set with the element at first position.
After that, using $project te query can filter the field rank_school and overwrite it to get only the element which field school is the same as student.school.
Note that the above steps could be omitted using another schema.
Then, after the $project there is a $match stage to get the documents whose rank_school.value is greater or equal than 3.
And the last stage is another $project to remove the field rank_school.
This is the query:
db.student.aggregate([
{
"$lookup": {
"from": "schoolRank",
"localField": "school",
"foreignField": "rank.school",
"as": "rank_school"
}
},
{
"$set": { "rank_school": { "$arrayElemAt": [ "$rank_school", 0 ] } }
},
{
"$project": {
"_id": "$_id",
"name": "$name",
"school": "$school",
"rank_school": {
"$filter": {
"input": "$rank_school.rank",
"as": "rank_school_filter",
"cond": { "$eq": [ "$$rank_school_filter.school", "$school" ] }
}
}
}
},
{
"$match": { "rank_school.value": { "$gte": 3 } }
},
{
"$project": { "rank_school": 0 }
}
])
Example here.
And the output is:
[
{
"_id": ObjectId("5a934e000102030405000003"),
"name": "Mike",
"school": "C"
},
{
"_id": ObjectId("5a934e000102030405000004"),
"name": "Tom",
"school": "D"
}
]

Does multiple $project stages in MongoDB aggregation affect performance

TL;DR
We add $project stage in between $match and $lookup stage in order to filter out the unnecessary data or aliasing the fields.Those $project stages improve the read ability of the query while debugging but will they affect the performance in any way when there are large number of documents in every collection involved in the query.
Question Detailed
For example I have two collections schools and students as shown below:
Yes the schema design is bad i know! MongoDB says - put everything in same collection to avoid relations but lets continue with this approach for now.
schools collection
{
"_id": ObjectId("5c04dca4289c601a393d9db8"),
"name": "First School Name",
"address": "1 xyz",
"status": 1,
// Many more fields
},
{
"_id": ObjectId("5c04dca4289c601a393d9db9"),
"name": "Second School Name",
"address": "2 xyz",
"status": 1,
// Many more fields
},
// Many more Schools
student collection
{
"_id": ObjectId("5c04dcd5289c601a393d9dbb"),
"name": "One Student Name",
"school_id": ObjectId("5c04dca4289c601a393d9db8"),
"address": "1 abc",
"Gender": "Male",
// Many more fields
},
{
"_id": ObjectId("5c04dcd5289c601a393d9dbc"),
"name": "Second Student Name",
"school_id": ObjectId("5c04dca4289c601a393d9db9"),
"address": "1 abc",
"Gender": "Male",
// Many more fields
},
// Many more students
Now in my query as shown below i have a $project stage after $match just before $lookup.
So is this $project stage necessary?
Will this stage affect performance when there are huge number of documents in the all the collections involved in the query?
db.students.aggregate([
{
$match: {
"Gender": "Male"
}
},
// 1. Below $project stage is not necessary apart from filtering out and aliasing.
// 2. Will this stage affect performance when there are huge number of documents?
{
$project: {
"_id": 0,
"student_id": "$_id",
"student_name": "$name",
"school_id": 1
}
},
{
$lookup: {
from: "schools",
let: {
"school_id": "$school_id"
},
pipeline: [
{
$match: {
"status": 1,
$expr: {
$eq: ["$_id", "$$school_id"]
}
}
},
{
$project: {
"_id": 0,
"name": 1
}
}
],
as: "school"
}
},
{
$unwind: "$school"
}
]);
Give this a read: https://docs.mongodb.com/v3.2/core/aggregation-pipeline-optimization/
Related to your particular case is
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.
So, there is some optimization going on behind the scenes. You might try tacking on the explain option to your aggregation to see exactly what mongo is doing to attempt to optimize your pipeline.
I think what you are doing should actually help performance as you are decreasing the amount of data flowing through.

MongoDB moving array of sub-documents to it's own collection

I'm looking to move an array of subdocuments into a collection of it's own keyed by the owner id. Currently, my collection is formed like this:
"_id": ObjectId("123"),
"username": "Bob Dole",
"logins": [{
"_id": ObjectId("abc123"),
"date": ISODate("2016")
}, {
"_id": ObjectId("def456"),
"date": ISODate("2016")
}]
I'm looking for the best way to write a script that would loop over each user, and move each item in the logins array to it's own "logins" collection, as follows:
{
"_id": ObjectId("abc123"),
"_ownerId": ObjectId("123"),
"date": ISODate("2016")
}
{
"_id": ObjectId("def567"),
"_ownerId": ObjectId("123"),
"date": ISODate("2016")
}
When the script ends, I'd like the login array to be removed entirely from all users.
this query will create new collection using aggregation framework
to see how it looks - just remove $out pipeline phase
db.thinking.aggregate([
{
$unwind:"$logins"
},{
$project:{
_id:"$logins._id",
_ownerId:"$_id",
date:"$logins.date"
}
},
{
$out: "newCollection"
}
])
to delete array records - as suggested in comment:
db.thinking.update({},{ "$unset": { "logins": "" } },{ "multi": true })