For context, I'm using MongoDB 3.6.4 and I'm trying to build a hierarchical schema for ACL permissions, but I'll boil the problem down and save the details.
Say I have a simple collection C, where parents is a list of references to other documents in C:
{
_id: ObjectId
parents: Array(ObjectId)
}
If I do an aggregation like:
[
{
$match: {_id: ObjectId("f00...")}
},
{
$graphLookup: {
from: "C",
startWith: "$parents",
connectFromField: "parents",
connectToField: "_id",
as: "graph"
}
}
]
I get back data like:
{
"_id": ObjectId("f00..."),
"parents": [ObjectId("f01..."), ObjectId("f02..."), ...],
"graph": [<doc1>, <doc2>, <doc3>, ...]
}
Is there a way to split the graph items out into documents? e.g. from the previous output example:
{
"_id": ObjectId("f00..."),
"parents": [ObjectId("f01..."), ObjectId("f02..."), ...]
}
<doc1>
<doc2>
<doc3>
You can try adding below stages to query.
[
{"$project":{"data":{"$concatArrays":[["$$ROOT"],"$graph"]}}},
{"$unwind":"$data"},
{"$project":{"data.graph":0}},
{"$replaceRoot":{"newRoot":"$data"}}
]
Related
I'm trying to build a complex, nested aggregation pipeline in MongoDB (4.4.9 Community Edition, using the pymongo driver for Python 3.10).
There are relevant data points in different collections which I want to aggregate into one, NEW (ideally) view (or, if that doesn't work) collection.
The collections, and the relevant fields therein follow a hierarchy. There is members, which contains the top-level key on which other data is to be merged,
membershipNumber.
> members.find_one()
{'_id': ObjectId('61153299af6122XXXXXXXXXXXXX'), 'membershipNumber': 'N03XXXXXX'}
Then, there's a different collection, which contains membershipNumber, but also a different, linked field, an_user_id. an_user_id is used in other collections to denote records/fields in arrays that pertain to that particular user.
I 'join' members and an_users like so:
result = members.aggregate([
{
'$lookup': {
'from': 'an_users',
'localField': 'membershipNumber',
'foreignField': 'memref',
'as': 'an_users'
}
},
{ '$unwind' : '$an_users' },
{
'$project' : {
'_id' : 1,
'membershipNumber' : 1,
'an_user_id' : '$an_users.user_id'
}
}
]);
So far so good, this returns the desired, aggregated record:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX'}
Now, I have a third collection, which contains the an_user_id as a string in arrays, denoting wherever that user clicked a given email, whereby a record is an email (and the an_user_ids in the clicks array are users that clicked a link in that email.
{'_id': ObjectId('blah'),
'email_id': '407XXX',
'actions_count': 17,
'administrative_title': 'test',
'bounce': ['3440XXXX'],
'click': ['38294CCC',
'418FFFF',
'48XXXXXX',
'38eGGGG'}
I want to count the number occurences of a given an_user_id (which I've attained from aggregating) in arrays (e.g. clicks, bounces, opens) in the emails collection, and include it in the .aggregate call, to retrieve something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12
}
Further, I might want to also attach counts of an_user_id in other collections in my DB.
Consider, e.g., this collection called events:
{
"_id": "617ffa96ee11844e143a63dd",
"id": "12345",
"administrative_title": "my_event",
"created_at": {
"$date": "2020-01-15T16:28:50.000Z"
},
"event_creator_id": "123456",
"event_title": "my_event",
"group_id": "123456",
"permalink": "event_id",
"rsvp_count": 54,
"rsvps": [{
"rsvp_id": "56789",
"display_name": "John Doe",
"rsvp_user_id": "48XXXXXX",
"rsvp_created_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"rsvp_updated_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"first_name": "John",
"last_name": "Doe",
}, {
"rsvp_id": "543895",
"display_name": "James Appleslice",
"rsvp_user_id": "N03XXXXXX",
"rsvp_created_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"rsvp_updated_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"first_name": "James",
"last_name": "Appleslice"}
]
}
So, the end-product would look something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12,
'n_rsvps' : 12
}
My idea was to use the $lookup parameter -- however, I only know how to use this for matching on fields that I have in the parent collection that I'm performing the aggregation on, but not on fields that have been generated in the process of the aggregation.
Any help would be hugely appreciated!
You could use $lookup pipeline. First you would $lookup the user id followed by another $lookup to verify if the user id exists in email. Lastly few more stages to collect the results and format per your need. Furthermore, you can add $out stage if you would like to write the results into another collection.
db.members.aggregate([{
$lookup: {
from: "an_users",
let: {
membershipNumber: "$membershipNumber"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$memref",
"$$membershipNumber"
]
},
}
},
{
"$lookup": {
"from": "emails",
"localField": "user_id",
"foreignField": "click",
"as": "clicks"
}
},
{
"$project": {
"_id": 1,
"membershipNumber": 1,
"an_user_id": "$user_id",
"n_email_clicks": {
$size: "$clicks"
}
}
}
],
as: "details"
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
{
$arrayElemAt: [
"$details",
0
]
},
"$$ROOT"
]
}
}
},
{
$project: {
details: 0
}
}])
Working example - https://mongoplayground.net/p/yrFsNp44hpi
I have a collection matches like this. I'm using players object {key: ObjectId, key: ObjectID} instead of classic array [ObjectId, ObjectID] for reference players collection
{
"_id": ObjectId("5eb93f8efd259cd7fbf49d55"),
"date": "01/01/2020",
"players": {
"home": ObjectId("5eb93f8efd259cd7fbf49d59"),
"away": ObjectId("5eb93f8efd259cd7fbf49d60")
}
},
{...}
And players collection:
{
"_id": ObjectId("5eb93f8efd259cd7fbf49d59"),
"name": "Roger Federer"
"country": "Suiza"
},
{
"_id": ObjectId("5eb93f8efd259cd7fbf49d60"),
"name": "Rafa Nadal"
"country": "España"
},
{...}
What's the better way to do mongoDB lookup? something like this is correct?
const rows = await db.collection('matches').aggregate([
{
$lookup: {
from: "players",
localField: "players.home",
foreignField: "_id",
as: "players.home"
}
},
{
$lookup: {
from: "players",
localField: "players.away",
foreignField: "_id",
as: "players.away"
},
{ $unwind: "$players.home" },
{ $unwind: "$players.away" },
}]).toArray()
I want output like this:
{
_id: 5eb93f8efd259cd7fbf49d55,
date: "12/05/20",
players: {
home: {
_id: 5eb93f8efd259cd7fbf49d59,
name: "Roger Federer",
country: "Suiza"
},
away: {
_id: 5eb93f8efd259cd7fbf49d60,
name: "Rafa Nadal",
country: "España"
}
}
}
{...}
You can try below aggregation query :
db.matches.aggregate([
{
$lookup: {
from: "players",
localField: "players.home",
foreignField: "_id",
as: "home"
}
},
{
$lookup: {
from: "players",
localField: "players.away",
foreignField: "_id",
as: "away"
}
},
/** Check output of lookup is not empty array `[]` & get first doc & write it to respective field, else write the same value as original */
{
$project: {
date: 1,
"players.home": { $cond: [ { $eq: [ "$home", [] ] }, "$players.home", { $arrayElemAt: [ "$home", 0 ] } ] },
"players.away": { $cond: [ { $eq: [ "$away", [] ] }, "$players.away", { $arrayElemAt: [ "$away", 0 ] } ] }
}
}
])
Test : mongoplayground
Changes or Issues with current Query :
1) As you're using two $unwind stages one after the other, If anyone of the field either home or away doesn't have a matching document in players collection then in the result you don't even get actual match document also, But why ? It's because if you do $unwind on [] (which is returned by lookup stage) then unwind will remove that parent document from result, To overcome this you need to use preservenullandemptyarrays option in unwind stage.
2) Ok, there is another way to do this without actually using $unwind. So do not use as: "players.home" or as: "players.away" cause you're actually writing back to original field, Just in case if you don't find a matching document an empty array [] will be written to actual fields either to "home" or "away" wherever there is not match (In this case you would loose actual ObjectId() value existing in that particular field in matches doc). So write output of lookup to a new field.
Or even more efficient way, instead of two $lookup stages (Cause each lookup has to go through docs of players collection again & again), you can try one lookup with multiple-join-conditions-with-lookup :
db.matches.aggregate([
{
$lookup: {
from: "players",
let: { home: "$players.home", away: "$players.away" },
pipeline: [
{
$match: { $expr: { $or: [ { $eq: [ "$_id", "$$home" ] }, { $eq: [ "$_id", "$$away" ] } ] } }
}
],
as: "data"
}
}
])
Test : mongoplayground
Note : Here all the matching docs from players which match with irrespective of away or home field will be pushed to data array. So to keep DB operation simple you can get that array from DB along with actual matches document & Offload some work to code which is to map respective objects from data array to players.home & players.away fields.
Using MongoDB 4.2 and MongoDB Atlas to test aggregation pipelines.
I've got this products collection, containing documents with this schema:
{
"name": "TestProduct",
"relatedList": [
{id:ObjectId("someId")},
{id:ObjectId("anotherId")}
]
}
Then there's this cities collection, containing documents with this schema :
{
"name": "TestCity",
"instructionList": [
{ related_id: ObjectId("anotherId"), foo: bar},
{ related_id: ObjectId("someId"), foo: bar}
{ related_id: ObjectId("notUsefulId"), foo: bar}
...
]
}
My objective is to join both collections to output something like this (the operation is picking each related object from the instructionList in the city document to put it into the relatedList of the product document) :
{
"name": "TestProduct",
"relatedList": [
{ related_id: ObjectId("someId"), foo: bar},
{ related_id: ObjectId("anotherId"), foo: bar},
]
}
I tried using the $lookup operator for aggregation like this :
$lookup:{
from: 'cities',
let: {rId:'$relatedList._id'},
pipeline: [
{
$match: {
$expr: {
$eq: ["$instructionList.related_id", "$$rId"]
}
}
},
]
}
But it's not working, I'm a bit lost with this complex pipeline syntax.
Edit
By using unwind on both arrays :
{
{$unwind: "$relatedList"},
{$lookup:{
from: "cities",
let: { "rId": "$relatedList.id" },
pipeline: [
{$unwind:"$instructionList"},
{$match:{$expr:{$eq:["$instructionList.related_id","$$rId"]}}},
],
as:"instructionList",
}},
{$group: {
_id: "$_id",
instructionList: {$addToSet:"$instructionList"}
}}
}
I am able to achieve what I want, however,
I'm not getting a clean result at all :
{
"name": "TestProduct",
instructionList: [
[
{
"name": "TestCity",
"instructionList": {
"related_id":ObjectId("someId")
}
}
],
[
{
"name": "TestCity",
"instructionList": {
"related_id":ObjectId("anotherId")
}
}
]
]
}
How can I group everything to be as clean as stated for my original question ?
Again, I'm completely lost with the Aggregation framework.
the operation is picking each related object from the instructionList in the city document to put it into the relatedList of the product document)
Given an example document on cities collection:
{"_id": ObjectId("5e4a22a08c54c8e2380b853b"),
"name": "TestCity",
"instructionList": [
{"related_id": "a", "foo": "x"},
{"related_id": "b", "foo": "y"},
{"related_id": "c", "foo": "z"}
]}
and an example document on products collection:
{"_id": ObjectId("5e45cdd8e8d44a31a432a981"),
"name": "TestProduct",
"relatedList": [
{"id": "a"},
{"id": "b"}
]}
You can achieve try using the following aggregation pipeline:
db.products.aggregate([
{"$lookup":{
"from": "cities",
"let": { "rId": "$relatedList.id" },
"pipeline": [
{"$unwind":"$instructionList"},
{"$match":{
"$expr":{
"$in":["$instructionList.related_id", "$$rId"]
}
}
}],
"as":"relatedList",
}},
{"$project":{
"name":"$name",
"relatedList":{
"$map":{
"input":"$relatedList",
"as":"x",
"in":{
"related_id":"$$x.instructionList.related_id",
"foo":"$$x.instructionList.foo"
}
}
}
}}
]);
To get a result as the following:
{ "_id": ObjectId("5e45cdd8e8d44a31a432a981"),
"name": "TestProduct",
"relatedList": [
{"related_id": "a", "foo": "x"},
{"related_id": "b", "foo": "y"}
]}
The above is tested in MongoDB v4.2.x.
But it's not working, I'm a bit lost with this complex pipeline syntax.
The reason why it's slightly complex here is because you have an array relatedList and also an array of subdocuments instructionList. When you refer to instructionList.related_id (which could mean multiple values) with $eq operator, the pipeline doesn't know which one to match.
In the pipeline above, I've added $unwind stage to turn instructionList into multiple single documents. Afterward, using $in to express a match of single value of instructionList.related_id in array relatedList.
I believe you just need to $unwind the arrays in order to lookup the relation, then $group to recollect them. Perhaps something like:
.aggregeate([
{$unwind:"relatedList"},
{$lookup:{
from:"cities",
let:{rId:"$relatedList.id"}
pipeline:[
{$match:{$expr:{$eq:["$instructionList.related_id", "$$rId"]}}},
{$unwind:"$instructionList"},
{$match:{$expr:{$eq:["$instructionList.related_id", "$$rId"]}}},
{$project:{_id:0, instruction:"$instructionList"}}
],
as: "lookedup"
}},
{$addFields: {"relatedList.foo":"$lookedup.0.instruction.foo"}},
{$group: {
_id:"$_id",
root: {$first:"$$ROOT"},
relatedList:{$push:"$relatedList"}
}},
{$addFields:{"root.relatedList":"$relatedList"}},
{$replaceRoot:{newRoot:"$root"}}
])
A little about each stage:
$unwind duplicates the entire document for each element of the array,
replace the array with the single element
$lookup can then consider each element separately. The stages in $lookup.pipeline:
a. $match so we only unwind the document with matching ID
b. $unwind the array so we can consider individual elements
c. repeat the $match so we are only left with matching elements (hopefully just 1)
$addFields assigns the foo field retrieved from the lookup to the object from relatedList
$group collects together all of the documents with the same _id (i.e. that were unwound from a single original document), stores the first as 'root', and pushes all of the relatedList elements back into an array
$addFields moves the relatedList in to root
$replaceRoot returns the root, which should now be the original document with the matching foo added to each relatedList element
For today's task I am trying aggregating documents in a collection (let's call it collection1 and in one of the pipeline's stages I am trying to use $lookup to retrieve documents from another collection (let's call it collection2).
collection1 object model:
{
"field1": "value1",
"field2": "value2"
"field3": "value3"
}
collection2 object model:
{
"field1: "value1",
"field2"; "value2",
"field3: {
"field31": "value31",
"field32": "value32"
}
}
What I am exactly trying to do is to retrieve the documents from collection2 where field3.field31 equals value of the collection1s field1.
My $lookup stage looks like approx like this but currently it doesn't seem to work. I did not find any clue if this should work but looking forward to your replies.
{
$lookup: {
from: "collection2",
let: {
"c": "$field1",
"l": "$field2",
"t": "$field3",
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$field1", "$$c"] },
{ $eq: ["$field2", "$$l"] },
{ $eq: ["$field3.field31", "$$t"] },
]
}
},
},
],
as: "awesomejoin"
}
}
I want to avoid having a project or a group and then unwinding and filtering again. My wish is to get the records directly from the match stage thinking this is better in terms of performance...
Let me know your thoughts on this.
Thank you
Please try this :
db.Collection1.aggregate([
{
$lookup:
{
from: "Collection2",
localField: "field1",
foreignField: "field3.field31",
as: "docs"
}
}
])
It should be simple with plain $lookup, not exactly sure why you're creating local variables and looking for equals on same variables, Also $unwind will be used on arrays, over objects you could access inner elements using . notation same as like in programming languages.
Ref : $lookup
I have collections of the following structure:
objects:
[{"type": "someTypeOne", "menuId": 1},
{"type": "someTypeTwo", "menuId": 1},
{"type": "someTypeOne", "menuId": 2}]
menus:
[{"id":1, "type": "someTypeOne"},
{"id":2, "type": "someTypeOne"}]
I need to find all objects where "type" property doesn't match its menus "type". In this case the desired output would be:
[{"type": "someTypeTwo", "menuId": 1}]
I think that I should use aggregation for this one and I'm fiddling with it at the moment but I was not able to formulate a working query so far.
Thanks
You can try below aggregation:
db.objects.aggregate([
{
$lookup: {
from: "menus",
localField: "menuId",
foreignField: "id",
as: "menu"
}
},
{
$unwind: "$menu"
},
{
$match: {
$expr: {
$ne: [ "$menu.type", "$type" ]
}
}
},
{
$project: {
menu: 0
}
}
])
$lookup allows you to get data from both collections, then you can run $unwind on menu array to get single menu per document and you can apply you inequality condition using $match and $expr
Mongo Playground