Remove documents not linked by DBRef mongodb - mongodb

Hello everyone I try to remove all documents which are no link by an DBRef field.
For example I have 2 collections User and Order.
Let's say User collection has this structure :
{
"_id": "12345",
"lastName": "Michael",
"firstName": "Bernard"
}
And Order collection has this structure :
{
"_id": "123456",
"orderDate": "2022-12-26",
"userWhoOrder": {
"$ref": "user",
"$id": "12345"
},
"userWhoPay": {
"$ref": "user",
"$id": "123456"
}
}
Note : userWhoOrder and userWhoPay are DBRef fields (and it can be the same user).
So I want remove all users who are not present in order collection (neither userWhoOrder neither userWhoPay).
I know I can do this in 2 steps :
Get a list of userWhoOrder and userWhoPay from order collection.
Filter users list which not contains users of step 1 and remove them.
But I want to know if there are a properly way to do this with a single request (using $lookup for example).
Here is what I tried for getting a users list to remove :
db.getCollection("user").aggregate[{$lookup: {
from: "order",
let: {userId: "$_id"},
pipeline: [
{$match: {$expr: {
$and: [
{$ne: ["$userWhoOrder.$id", "$$userId"]},
{$ne: ["$userWhoPay.$id", "$$userId"]}
]
}}}
],
as: "result"
}]

Related

multi-stage aggregation pipeline matching data based on fields retrieved through $lookup

I'm trying to build a complex, nested aggregation pipeline in MongoDB (4.4.9 Community Edition, using the pymongo driver for Python 3.10).
There are relevant data points in different collections which I want to aggregate into one, NEW (ideally) view (or, if that doesn't work) collection.
The collections, and the relevant fields therein follow a hierarchy. There is members, which contains the top-level key on which other data is to be merged,
membershipNumber.
> members.find_one()
{'_id': ObjectId('61153299af6122XXXXXXXXXXXXX'), 'membershipNumber': 'N03XXXXXX'}
Then, there's a different collection, which contains membershipNumber, but also a different, linked field, an_user_id. an_user_id is used in other collections to denote records/fields in arrays that pertain to that particular user.
I 'join' members and an_users like so:
result = members.aggregate([
{
'$lookup': {
'from': 'an_users',
'localField': 'membershipNumber',
'foreignField': 'memref',
'as': 'an_users'
}
},
{ '$unwind' : '$an_users' },
{
'$project' : {
'_id' : 1,
'membershipNumber' : 1,
'an_user_id' : '$an_users.user_id'
}
}
]);
So far so good, this returns the desired, aggregated record:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX'}
Now, I have a third collection, which contains the an_user_id as a string in arrays, denoting wherever that user clicked a given email, whereby a record is an email (and the an_user_ids in the clicks array are users that clicked a link in that email.
{'_id': ObjectId('blah'),
'email_id': '407XXX',
'actions_count': 17,
'administrative_title': 'test',
'bounce': ['3440XXXX'],
'click': ['38294CCC',
'418FFFF',
'48XXXXXX',
'38eGGGG'}
I want to count the number occurences of a given an_user_id (which I've attained from aggregating) in arrays (e.g. clicks, bounces, opens) in the emails collection, and include it in the .aggregate call, to retrieve something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12
}
Further, I might want to also attach counts of an_user_id in other collections in my DB.
Consider, e.g., this collection called events:
{
"_id": "617ffa96ee11844e143a63dd",
"id": "12345",
"administrative_title": "my_event",
"created_at": {
"$date": "2020-01-15T16:28:50.000Z"
},
"event_creator_id": "123456",
"event_title": "my_event",
"group_id": "123456",
"permalink": "event_id",
"rsvp_count": 54,
"rsvps": [{
"rsvp_id": "56789",
"display_name": "John Doe",
"rsvp_user_id": "48XXXXXX",
"rsvp_created_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"rsvp_updated_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"first_name": "John",
"last_name": "Doe",
}, {
"rsvp_id": "543895",
"display_name": "James Appleslice",
"rsvp_user_id": "N03XXXXXX",
"rsvp_created_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"rsvp_updated_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"first_name": "James",
"last_name": "Appleslice"}
]
}
So, the end-product would look something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12,
'n_rsvps' : 12
}
My idea was to use the $lookup parameter -- however, I only know how to use this for matching on fields that I have in the parent collection that I'm performing the aggregation on, but not on fields that have been generated in the process of the aggregation.
Any help would be hugely appreciated!
You could use $lookup pipeline. First you would $lookup the user id followed by another $lookup to verify if the user id exists in email. Lastly few more stages to collect the results and format per your need. Furthermore, you can add $out stage if you would like to write the results into another collection.
db.members.aggregate([{
$lookup: {
from: "an_users",
let: {
membershipNumber: "$membershipNumber"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$memref",
"$$membershipNumber"
]
},
}
},
{
"$lookup": {
"from": "emails",
"localField": "user_id",
"foreignField": "click",
"as": "clicks"
}
},
{
"$project": {
"_id": 1,
"membershipNumber": 1,
"an_user_id": "$user_id",
"n_email_clicks": {
$size: "$clicks"
}
}
}
],
as: "details"
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
{
$arrayElemAt: [
"$details",
0
]
},
"$$ROOT"
]
}
}
},
{
$project: {
details: 0
}
}])
Working example - https://mongoplayground.net/p/yrFsNp44hpi

MongoDB delete embedded documents through array of Ids

I am working on a Node.js application that is using a MongoDB database with Mongoose. I've been stuck in this thing and didn't come up with the right query.
Problem:
There is a collection named chats which contain embedded documents (rooms) as an array of objects. I want to delete these embedded documents (rooms) through Ids which are in the array.
{
"_id": "ObjectId(6138e2b55c175846ec1e38c5)",
"type": "bot",
"rooms": [
{
"_id": "ObjectId(6138e2b55c145846ec1e38c5)",
"genre": "action"
},
{
"_id": "ObjectId(6138e2b545c145846ec1e38c5)",
"genre": "adventure"
}
]
},
{
"_id": "ObjectId(6138e2b55c1765846ec1e38c5)",
"type": "person",
"rooms": [
{
"_id": "ObjectId(6138e2565c145846ec1e38c5)",
"genre": "food"
},
{
"_id": "ObjectId(6138e2b5645c145846ec1e38c5)",
"genre": "sport"
}
]
},
{
"_id": "ObjectId(6138e2b55c1765846ec1e38c5)",
"type": "duo",
"rooms": [
{
"_id": "ObjectId(6138e21c145846ec1e38c5)",
"genre": "travel"
},
{
"_id": "ObjectId(6138e35645c145846ec1e38c5)",
"genre": "news"
}
]
}
I am converting my array of ids into MongoDB ObjectId so I can use these ids as match criteria.
const idsRoom = [
'6138e21c145846ec1e38c5',
'6138e2565c145846ec1e38c5',
'6138e2b545c145846ec1e38c5',
];
const objectIdArray = idsRoom.map((s) => mongoose.Types.ObjectId(s));
and using this query for the chat collection. But it is deleting the whole document and I only want to delete the rooms embedded document because the ids array is only for the embedded documents.
Chat.deleteMany({ 'rooms._id': objectIdArray }, function (err) {
console.log('Delete successfully')
})
I really appreciate your help on this issue.
You have to use $pull operator in a update query like this:
This query look for documents where exists the _id into rooms array and use $pull to remove the object from the array.
yourModel.updateMany({
"rooms._id": {
"$in": [
"6138e21c145846ec1e38c5",
"6138e2565c145846ec1e38c5",
"6138e2b545c145846ec1e38c5"
]
}
},
{
"$pull": {
"rooms": {
"_id": {
"$in": [
"6138e21c145846ec1e38c5",
"6138e2565c145846ec1e38c5",
"6138e2b545c145846ec1e38c5"
]
}
}
}
})
Example here.
Also you can run your query without the query parameter (in update queries the first object is the query) like this and result is the same. But is better to indicate mongo the documents using this first object.

Link each element of array in a document to the corresponding element in an array of another document with MongoDB

Using MongoDB 4.2 and MongoDB Atlas to test aggregation pipelines.
I've got this products collection, containing documents with this schema:
{
"name": "TestProduct",
"relatedList": [
{id:ObjectId("someId")},
{id:ObjectId("anotherId")}
]
}
Then there's this cities collection, containing documents with this schema :
{
"name": "TestCity",
"instructionList": [
{ related_id: ObjectId("anotherId"), foo: bar},
{ related_id: ObjectId("someId"), foo: bar}
{ related_id: ObjectId("notUsefulId"), foo: bar}
...
]
}
My objective is to join both collections to output something like this (the operation is picking each related object from the instructionList in the city document to put it into the relatedList of the product document) :
{
"name": "TestProduct",
"relatedList": [
{ related_id: ObjectId("someId"), foo: bar},
{ related_id: ObjectId("anotherId"), foo: bar},
]
}
I tried using the $lookup operator for aggregation like this :
$lookup:{
from: 'cities',
let: {rId:'$relatedList._id'},
pipeline: [
{
$match: {
$expr: {
$eq: ["$instructionList.related_id", "$$rId"]
}
}
},
]
}
But it's not working, I'm a bit lost with this complex pipeline syntax.
Edit
By using unwind on both arrays :
{
{$unwind: "$relatedList"},
{$lookup:{
from: "cities",
let: { "rId": "$relatedList.id" },
pipeline: [
{$unwind:"$instructionList"},
{$match:{$expr:{$eq:["$instructionList.related_id","$$rId"]}}},
],
as:"instructionList",
}},
{$group: {
_id: "$_id",
instructionList: {$addToSet:"$instructionList"}
}}
}
I am able to achieve what I want, however,
I'm not getting a clean result at all :
{
"name": "TestProduct",
instructionList: [
[
{
"name": "TestCity",
"instructionList": {
"related_id":ObjectId("someId")
}
}
],
[
{
"name": "TestCity",
"instructionList": {
"related_id":ObjectId("anotherId")
}
}
]
]
}
How can I group everything to be as clean as stated for my original question ?
Again, I'm completely lost with the Aggregation framework.
the operation is picking each related object from the instructionList in the city document to put it into the relatedList of the product document)
Given an example document on cities collection:
{"_id": ObjectId("5e4a22a08c54c8e2380b853b"),
"name": "TestCity",
"instructionList": [
{"related_id": "a", "foo": "x"},
{"related_id": "b", "foo": "y"},
{"related_id": "c", "foo": "z"}
]}
and an example document on products collection:
{"_id": ObjectId("5e45cdd8e8d44a31a432a981"),
"name": "TestProduct",
"relatedList": [
{"id": "a"},
{"id": "b"}
]}
You can achieve try using the following aggregation pipeline:
db.products.aggregate([
{"$lookup":{
"from": "cities",
"let": { "rId": "$relatedList.id" },
"pipeline": [
{"$unwind":"$instructionList"},
{"$match":{
"$expr":{
"$in":["$instructionList.related_id", "$$rId"]
}
}
}],
"as":"relatedList",
}},
{"$project":{
"name":"$name",
"relatedList":{
"$map":{
"input":"$relatedList",
"as":"x",
"in":{
"related_id":"$$x.instructionList.related_id",
"foo":"$$x.instructionList.foo"
}
}
}
}}
]);
To get a result as the following:
{ "_id": ObjectId("5e45cdd8e8d44a31a432a981"),
"name": "TestProduct",
"relatedList": [
{"related_id": "a", "foo": "x"},
{"related_id": "b", "foo": "y"}
]}
The above is tested in MongoDB v4.2.x.
But it's not working, I'm a bit lost with this complex pipeline syntax.
The reason why it's slightly complex here is because you have an array relatedList and also an array of subdocuments instructionList. When you refer to instructionList.related_id (which could mean multiple values) with $eq operator, the pipeline doesn't know which one to match.
In the pipeline above, I've added $unwind stage to turn instructionList into multiple single documents. Afterward, using $in to express a match of single value of instructionList.related_id in array relatedList.
I believe you just need to $unwind the arrays in order to lookup the relation, then $group to recollect them. Perhaps something like:
.aggregeate([
{$unwind:"relatedList"},
{$lookup:{
from:"cities",
let:{rId:"$relatedList.id"}
pipeline:[
{$match:{$expr:{$eq:["$instructionList.related_id", "$$rId"]}}},
{$unwind:"$instructionList"},
{$match:{$expr:{$eq:["$instructionList.related_id", "$$rId"]}}},
{$project:{_id:0, instruction:"$instructionList"}}
],
as: "lookedup"
}},
{$addFields: {"relatedList.foo":"$lookedup.0.instruction.foo"}},
{$group: {
_id:"$_id",
root: {$first:"$$ROOT"},
relatedList:{$push:"$relatedList"}
}},
{$addFields:{"root.relatedList":"$relatedList"}},
{$replaceRoot:{newRoot:"$root"}}
])
A little about each stage:
$unwind duplicates the entire document for each element of the array,
replace the array with the single element
$lookup can then consider each element separately. The stages in $lookup.pipeline:
a. $match so we only unwind the document with matching ID
b. $unwind the array so we can consider individual elements
c. repeat the $match so we are only left with matching elements (hopefully just 1)
$addFields assigns the foo field retrieved from the lookup to the object from relatedList
$group collects together all of the documents with the same _id (i.e. that were unwound from a single original document), stores the first as 'root', and pushes all of the relatedList elements back into an array
$addFields moves the relatedList in to root
$replaceRoot returns the root, which should now be the original document with the matching foo added to each relatedList element

Combined lookup and embed in the same MongoDB collection depending on root document structure

I want to make a conditional lookup with Mongo in the following way: a root document contains either a link (customerId) to the customers collection or directly embeds a customer, like this:
{
"_id" : 1,
"item": "item1",
"customer": { _id: 1, "name": "Jane Doe" }
},
{
"_id": 2,
"item": "item2",
"customerId": 1
}
customers collection:
{ _id: 1, "name": "Jane Johnson" }
The customers collection stores the current versions of customers; to maintain consistency, members of the items collection will contain just ids of customer. But if I want to freeze an item so it holds the version of its customer at a certain time, I will embed that customer directly into the item in question.
When searching for items I want them to appear uniformly (i.e. regardless whether customer is looked up or embedded it will appear as embedded field):
e.g.
[{
"_id" : 1,
"item": "item1",
"customer": { _id: 1, "name": "Jane Doe" } // historical version of Jane (embedded)
},
{
"_id": 2,
"item": "item2",
"customer": { _id: 1, "name": "Jane Johnson" } // current version of Jane by lookup
}]
Question 1: is this the right approach and if not what is the best practice for handling cases like this?
Question 2: if my approach is correct, how to best use aggregation framework to achieve this?
Thanks!
The answers:
Yes, you can achieve it with aggregation framework and it is one of the possible solutions (the other possible solution would be to implement it in your program).
Just use $lookup (for data merging from other collection) and $project (for final result generation) pipeline stages.
Query example:
db.getCollection('items').aggregate([
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerData"
}
},
{
$project: {
"item": 1,
"customer": {
$cond: {
if: {$gt: [{$size: "$customerData"}, 0]},
then: {$arrayElemAt: ["$customerData", 0]},
else: "$customer"
}
}
}
}
]);

MongoDB moving array of sub-documents to it's own collection

I'm looking to move an array of subdocuments into a collection of it's own keyed by the owner id. Currently, my collection is formed like this:
"_id": ObjectId("123"),
"username": "Bob Dole",
"logins": [{
"_id": ObjectId("abc123"),
"date": ISODate("2016")
}, {
"_id": ObjectId("def456"),
"date": ISODate("2016")
}]
I'm looking for the best way to write a script that would loop over each user, and move each item in the logins array to it's own "logins" collection, as follows:
{
"_id": ObjectId("abc123"),
"_ownerId": ObjectId("123"),
"date": ISODate("2016")
}
{
"_id": ObjectId("def567"),
"_ownerId": ObjectId("123"),
"date": ISODate("2016")
}
When the script ends, I'd like the login array to be removed entirely from all users.
this query will create new collection using aggregation framework
to see how it looks - just remove $out pipeline phase
db.thinking.aggregate([
{
$unwind:"$logins"
},{
$project:{
_id:"$logins._id",
_ownerId:"$_id",
date:"$logins.date"
}
},
{
$out: "newCollection"
}
])
to delete array records - as suggested in comment:
db.thinking.update({},{ "$unset": { "logins": "" } },{ "multi": true })