Pretty $lookup on collection - mongoDB - mongodb

I have two collection:
Competition
{
"_id": "326",
signed_up": [
{"_id": "00001","category": ["First"], "status": true}]
}
and Playing
{
"_id": "6076e504db319b11c077d473",
"competition_id": "326",
"player": {"player_id": "00001","handicap": 6},
"totalScore": 6
}
I want to add playing --> totalScore on competition.signed_up array, based on player_id field:
{
"_id": "326",
signed_up": [
{"_id": "00001","category": ["First"], "status": true, "totalScore": 6]
}
I do not know how to do...

I'm not telling you this is the optimal way, but it seems to work...
Let's start out with the data. I've added one player to the competition, just to make it a little easier to see that things works as expected:
db.competition.insertOne({
"_id": "326",
"signed_up": [{
"_id": "00001",
"category": ["First"],
"status": true
}, {
"_id": "00002",
"category": ["First"],
"status": true
}]
})
db.playing.insertMany([
{
"competition_id": "326",
"player": {
"playing_id": "00001"
},
"totalScore": 6
},
{
"competition_id": "326",
"player": {
"playing_id": "00002"
},
"totalScore": 2
}
]);
Now for the aggregation...
db.competition.aggregate([
// Even though the [documentation](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use--lookup-with-an-array) states that unwinding is no longer necessary,
// I'm not sure if that includes arrays of subdocuments or only arrays of primitives. So I've chosen to unwind anyway...
{
$unwind: "$signed_up"
},
// => { "_id": "326", "signed_up": { "_id": "00001", ....} }
// now we have each player in it's own document and can easily lookup the score from playing collection
{
$lookup: {
from: 'playing',
localField: 'signed_up._id',
foreignField: 'player.playing_id',
as: 'player'
}
},
// => { "_id": "326", "signed_up": {...}, "player": [{ competition_id": "326"...}, ..]}
// now we have the matching competition documents as an array on each document.
// But we know there will only be one match and don't really care for the array,
// so we have to do some gymnastics to get the data we want where we want it
{
$project: {
"signed_up": {
$let: {
vars: {
player: { $arrayElemAt: [ "$player", 0 ] }
},
in: {
$mergeObjects: [
"$signed_up",
{ "totalScore": "$$player.totalScore" }
]
}
}
}
}
},
// => { "_id": "326", "signed_up": { "_id": "00001", .... , "totalScore": 6 } }
// Now we're pretty much done, except that we need to group the documents back
// into the original competition documents
{
$group: {
_id: "$_id",
signed_up: {
$push: "$signed_up"
}
}
}
// => { "_id": "326", "signed_up": [ { "_id": "00001", ....}, {"_id": "00002", ...} ] }
// And that completes the pipeline.
]);
I see that you have the id from the competition document also on the playing document, so I suspect that you need an additional check on the lookup to make sure you get the correct match. The way the code I have works, is that if you have more than one competition, you will get all the competitions for a player added to the playing array after the lookup.
If you take a look at the example Specify Multiple Join Conditions with $lookup in the documentation, you see how you can change the $lookup stage to do a more precise match on the target documents by using a pipeline on the target collection. It also shows how you can include a projection in that pipeline to only return the data that you really want.
Edit
Take a look at the following alternative lookup step:
{
$lookup: {
from: 'playing',
let: { playerid: "$signed_up._id", compid: "$_id" },
pipeline: [
{ $match: {
$expr: {
$and: [
{ $eq: ["$player.playing_id","$$playerid" ] },
{ $eq: ["$competition_id", "$$compid" ] }
]
}
}
},
{ $project: {
_id: 0,
"totalScore": 1
}
}
],
as: 'player'
}
}
This stores the players id and competition id from the current document into two variables. Then it uses those two variables in a pipeline run against the other collection. In addition to the $match to select the right player/competition document, it also includes a $project to get rid of the other fields on the playing documents. It will still return an array of one object, but it might save some bytes of memory usage...

Related

Get Data from another collection (string -> ObjectId)

Let's say I have these two collections:
// Members:
{
"_id":{
"$oid":"60dca71f0394f430c8ca296d"
},
"church":"60dbb265a75a610d90b45c6b",
"name":"Julio Verne Cerqueira"
},
{
"_id":{
"$oid":"60dca71f0394f430c8ca29a8"
},
"nome":"Ryan Steel Oliveira",
"church":"60dbb265a75a610d90b45c6c"
}
And
// Churches
{
"_id": {
"$oid": "60dbb265a75a610d90b45c6c"
},
"name": "Saint Antoine Hill",
"active": true
},
{
"_id": {
"$oid": "60dbb265a75a610d90b45c6b"
},
"name": "Jackeline Hill",
"active": true
}
And I want to query it and have a result like this:
// Member with Church names
{
"_id":{
"$oid":"60dca71f0394f430c8ca296d"
},
"church":"Jackeline Hill",
"name":"Julio Verne Cerqueira"
},
{
"_id":{
"$oid":"60dca71f0394f430c8ca29a8"
},
"church":"Saint Antoine Hill",
"nome":"Ryan Steel Oliveira"
}
If I try a Lookup, I have the following Result: (It is getting the entire churches collection).
How would I do the query, so it gives me only the one church that member is related to?
And, if possible, how to Sort the result in alphabetical order by church then by name?
Obs.: MongoDB Version: 4.4.10
There is matching error in the $lookup --> $pipeline --> $match.
It should be:
$match: {
$expr: {
$eq: [
"$_id",
"$$searchId"
]
}
}
From the provided documents, members to churchies relationship will be 1 to many. Hence, when you join members with churchies via $lookup, the output church will be an array with only one churchies document.
Aggregation pipelines:
$lookup - Join members collection (by $$searchId) with churchies (by _id).
$unwind - Deconstruct church array field to multiple documents.
$project - Decorate output document.
$sort - Sort by church and name ascending.
db.members.aggregate([
{
"$lookup": {
"from": "churchies",
"let": {
searchId: {
"$toObjectId": "$church"
}
},
"pipeline": [
{
$match: {
$expr: {
$eq: [
"$_id",
"$$searchId"
]
}
}
},
{
$project: {
name: 1
}
}
],
"as": "church"
}
},
{
"$unwind": "$church"
},
{
$project: {
_id: 1,
church: "$church.name",
name: 1
}
},
{
"$sort": {
"church": 1,
"name": 1
}
}
])
Sample Mongo Playground

aggregate with unwind, how to limit per document and not globally? (mongodb)

If I have a collection with 300 documents, each document has a array field called items (each item of the array is an object), something like this:
*DOCUMENT 1:*
_id: **********,
title: "test",
desc: "test desc",
items (array)
0: (object)
title: (string)
tags: (array of strings)
1: (object)
etc.
and I need to retrieve items by tags, what I'm using is this query below. I have to $limit results to something like 200 or the query is too big, the problem is if the first document has more than 200 items what it returns are only items of that document, what I'd need is to limit results PER document, for instance I'd need to retrieve 5 items for each different document where tags match ($all) tags provided.
const foundItems = await db.collection('store').aggregate([
{
$unwind: '$items'
},
{
$match: {
'items.tags': { $all : tagsArray }
}
},
{
$project: {
myitem: '$items',
desc: 1,
title: 1
}
},
{
$limit: 200
}
]).toArray()
to make it more clear and simple what I'd need in a ideal world would be something like:
{
$limit: 5,
$per: _id,
$totalLimit: 200
}
instead of $limit: 200 , is this achievable somehow? I didn't find any explanation about it in the official documentation.
What I tried is to add $sort right before $limit which would make sense if it had the behaviour I'm looking for put it that way and maybe not if placed AFTER the limit, but unfortunately it doesn't work that way and placed before or after the limit doesn't make any difference.
And I can't really use $sample since results are more than the 5%
Updated demo - https://mongoplayground.net/p/nM6T9XVa-XK
db.collection.aggregate([
{ $unwind: "$items" },
{
$match: {
"items.tags": {
$all: [ "a","b" ]
}
}
},
{
"$group": {
"_id": "$_id",
"myitem": { "$push": "$items" },
desc: { "$first": "$desc" },
title: { "$first": "$title" }
}
},
{
"$project": {
"_id": 1,
desc: 1,
title: 1,
"myitem": { $slice: [ "$myitem", 2 ]
}
}
},
{
$unwind: "$myitem"
}
])
Demo - https://mongoplayground.net/p/BESptnyUfSS
After matching the records you can $group them according to id and $project them and limit them using Use $slice
db.collection.aggregate([
{ $unwind: "$items" },
{
$match: {
"items.tags": { $all: [ "a", "b" ]
}
}
},
{
$project: {
_id: 1, myitem: "$items", desc: 1,title: 1
}
},
{
"$group": {
"_id": "$_id",
"myitem": { "$push": "$myitem" }
}
},
{
"$project": {
"_id": 1,
"myitem": {
$slice: [ "$myitem", 1 ] // limit records here per group / id
}
}
}
])

Given an array of objects, how can I filter the result of the existing objects in database according to each match of the array?

I have a collection with a structure like this:
{
"toystore": 22,
"toystore_name": "Toystore A",
"toys": [
{
"toy": "buzz",
"code": 17001,
"price": 500
},
{
"toy": "woddy",
"code": 17002,
"price": 1000
},
{
"toy": "pope",
"code": 17003,
"price": 300
}
]
},
{
"toystore": 11,
"toystore_name": "Toystore B",
"toys": [
{
"toy": "jessie",
"code": 17005,
"price": 500
},
{
"toy": "rex",
"code": 17006,
"price": 2000
}
]
}
]
I have n toy stores, and within each toy store I have the toys that this store has available within the toys field (is an array).
There may be repeated codes that I want to search for
[ { "toys.code": 17001 }, { "toys.code": 17003 }, { "toys.code": 17005 }, { "toys.code": 17005 }]
and I want the result to be generated by each of these toys.code no matter if they are repeated, currently the result is not repeated (for example with the code 17005)
this is my current output:
[
{
"_id": "Toystore A",
"toy_array": [
{
"price_original": 500,
"toy": "buzz"
},
{
"price_original": 300,
"toy": "pope"
}
]
},
{
"_id": "Toystore B",
"toy_array": [
//**********
//as i searched 2 times the code:17005, this object should be shown 2 times. only is showed 1 time.
{
"price_original": 500,
"toy": "jessie"
}
]
}
]
how can I get a result to return for every match in my array?
this is my live code:
db.collection.aggregate([
{
$unwind: "$toys"
},
{
$match: {
$or: [
{
"toys.code": 17001
},
{
"toys.code": 17003
},
{
"toys.code": 17005
},
{
"toys.code": 17005
}
],
}
},
{
$group: {
_id: "$toystore_name",
toy_array: {
$push: {
price_original: "$toys.price",
toy: "$toys.toy"
},
},
},
},
])
https://mongoplayground.net/p/g1-oST015y0
The $match stage examines each document in the pipeline and evaluates the provided criteria, and either eliminates the document, or passes it along to the next stage. It does not iterate the match criteria and examine the entire stream of documents for each one, which is what needs to happen in order to duplicate the document that is referenced twice.
This can be done, but you will need to pass the array of codes twice in the pipeline, once to eliminate documents that don't match at all, and again to allow the duplication you are looking for.
The stages needed are:
$match to eliminate toy store that don't have any of the requested toy
$project using
o $map to iterate the search array
o $filter to selection matching toys
o $reduce to eliminate empty arrays, and recombine the entries into a single array
an additional $project to remove the codes from toy_array
var codearray = [17001, 17003, 17005, 17005];
db.collection.aggregate([
{$match: {"toys.code": {$in: codearray }}},
{$project: {
_id: "$toystore_name",
toy_array: {
$reduce: {
input: {
$map: {
input: codearray,
as: "qcode",
in: {
$filter: {
input: "$toys",
as: "toy",
cond: {$eq: [ "$$toy.code","$$qcode" ]}
}
}
}
},
initialValue: [],
in: {
$cond: {
if: {$eq: ["$$this",[]]},
then: "$$value",
else: {$concatArrays: ["$$value", "$$this"]}
}
}
}
}
}},
{$project: {"toy_array.code": 0}}
])
Playground

Lookup and group from two fields in one aggregation

I have an aggregation that looks like this:
userSchema.statics.getCounts = function (req, type) {
return this.aggregate([
{ $match: { organization: req.user.organization._id } },
{
$lookup: {
from: 'tickets', localField: `${type}Tickets`, foreignField: '_id', as: `${type}_tickets`,
},
},
{ $unwind: `$${type}_tickets` },
{ $match: { [`${type}_tickets.createdAt`]: { $gte: new Date(moment().subtract(4, 'd').startOf('day').utc()), $lt: new Date(moment().endOf('day').utc()) } } },
{
$group: {
_id: {
groupDate: {
$dateFromParts: {
year: { $year: `$${type}_tickets.createdAt` },
month: { $month: `$${type}_tickets.createdAt` },
day: { $dayOfMonth: `$${type}_tickets.createdAt` },
},
},
userId: `$${type}_tickets.assignee_id`,
},
ticketCount: {
$sum: 1,
},
},
},
{
$sort: { '_id.groupDate': -1 },
},
{ $group: { _id: '$_id.userId', data: { $push: { groupDate: '$_id.groupDate', ticketCount: '$ticketCount' } } } },
]);
};
Which outputs data like this:
[
{
_id: 5aeb6b71709f43359e0888bb,
data: [
{ "groupDate": 2018-05-07T00:00:000Z", ticketCount: 4 }
}
]
Ideally though, I would have data like this:
[
{
_id: 5aeb6b71709f43359e0888bb,
data: [
{ "groupDate": 2018-05-07T00:00:000Z", assignedCount: 4, resolvedCount: 8 }
}
]
The difference being that the object for the user would output both the total number of assigned tickets and the total number of resolved tickets for each date.
My userSchema is like this:
const userSchema = new Schema({
firstName: String,
lastName: String,
assignedTickets: [
{
type: mongoose.Schema.ObjectId,
ref: 'Ticket',
index: true,
},
],
resolvedTickets: [
{
type: mongoose.Schema.ObjectId,
ref: 'Ticket',
index: true,
},
],
}, {
timestamps: true,
});
An example user doc is like this:
{
"_id": "5aeb6b71709f43359e0888bb",
"assignedTickets": ["5aeb6ba7709f43359e0888bd", "5aeb6bf3709f43359e0888c2", "5aec7e0adcdd76b57af9e889"],
"resolvedTickets": ["5aeb6bc2709f43359e0888be", "5aeb6bc2709f43359e0888bf"],
"firstName": "Name",
"lastName": "Surname",
}
An example ticket doc is like this:
{
"_id": "5aeb6ba7709f43359e0888bd",
"ticket_id": 120292,
"type": "assigned",
"status": "Pending",
"assignee_email": "email#gmail.com",
"assignee_id": "5aeb6b71709f43359e0888bb",
"createdAt": "2018-05-02T20:05:59.147Z",
"updatedAt": "2018-05-03T20:05:59.147Z",
}
I've tried adding multiple lookups and group stages, but I keep getting an empty array. If I only do one lookup and one group, I get the correct counts for the searched on field, but I'd like to have both fields in one query. Is it possible to have the query group on two lookups?
In short you seem to be coming to terms with setting up your models in mongoose and have gone overboard with references. In reality you really should not keep the arrays within the "User" documents. This is actually an "anti-pattern" which was just something mongoose used initially as a convention for keeping "references" for population where it did not understand how to translate the references from being kept in the "child" to the "parent" instead.
You actually have that data in each "Ticket" and the natural form of $lookup is to use that "foreignField" in reference to the detail from the local collection. In this case the "assignee_id" on the tickets will suffice for looking at matching back to the "_id" of the "User". Though you don't state it, your "status" should be an indicator of whether the data is actually either "assigned" as when in "Pending" state or "resolved" when it is not.
For the sake of simplicity we are going to consider the state "resolved" if it is anything other than "Pending" in value, but extending on the logic from the example for actual needs is not the problem here.
Basically then we resolve to a single $lookup operation by actually using the natural "foreign key" as opposed to keeping separate arrays.
MongoDB 3.6 and greater
Ideally you would use features from MongoDB 3.6 with sub-pipeline processing here:
// Better date calculations
const oneDay = (1000 * 60 * 60 * 24);
var now = Date.now(),
end = new Date((now - (now % oneDay)) + oneDay),
start = new Date(end.valueOf() - (4 * oneDay));
User.aggregate([
{ "$match": { "organization": req.user.organization._id } },
{ "$lookup": {
"from": Ticket.collection.name,
"let": { "id": "$_id" },
"pipeline": [
{ "$match": {
"createdAt": { "$gte": start, "$lt": end },
"$expr": {
"$eq": [ "$$id", "$assignee_id" ]
}
}},
{ "$group": {
"_id": {
"status": "$status",
"date": {
"$dateFromParts": {
"year": { "$year": "$createdAt" },
"month": { "$month": "$createdAt" },
"day": { "$dayOfMonth": "$createdAt" }
}
}
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.date",
"data": {
"$push": {
"k": {
"$cond": [
{ "$eq": ["$_id.status", "Pending"] },
"assignedCount",
"resolvedCount"
]
},
"v": "$count"
}
}
}},
{ "$sort": { "_id": -1 } },
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "groupDate": "$_id", "assignedCount": 0, "resolvedCount": 0 },
{ "$arrayToObject": "$data" }
]
}
}}
],
"as": "data"
}},
{ "$project": { "data": 1 } }
])
From MongoDB 3.0 and upwards
Or where you lack those features we use a different pipeline process and a little data transformation after the results are returned from the server:
User.aggregate([
{ "$match": { "organization": req.user.organization._id } },
{ "$lookup": {
"from": Ticket.collection.name,
"localField": "_id",
"foreignField": "assignee_id",
"as": "data"
}},
{ "$unwind": "$data" },
{ "$match": {
"data.createdAt": { "$gte": start, "$lt": end }
}},
{ "$group": {
"_id": {
"userId": "$_id",
"date": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$data.createdAt", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$data.createdAt", new Date(0) ] },
oneDay
]}
]},
new Date(0)
]
},
"status": "$data.status"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"userId": "$_id.userId",
"date": "$_id.date"
},
"data": {
"$push": {
"k": {
"$cond": [
{ "$eq": [ "$_id.status", "Pending" ] },
"assignedCount",
"resolvedCount"
]
},
"v": "$count"
}
}
}},
{ "$sort": { "_id.userId": 1, "_id.date": -1 } },
{ "$group": {
"_id": "$_id.userId",
"data": {
"$push": {
"groupDate": "$_id.date",
"data": "$data"
}
}
}}
])
.then( results =>
results.map( ({ data, ...d }) =>
({
...d,
data: data.map(di =>
({
groupDate: di.groupDate,
assignedCount: 0,
resolvedCount: 0,
...di.data.reduce((acc,curr) => ({ ...acc, [curr.k]: curr.v }),{})
})
)
})
)
)
Which just really goes to show that even with the fancy features in modern releases, you really don't need them because there pretty much has always been ways to work around this. Even the JavaScript parts just had slightly longer winded versions before the current "object spread" syntax was available.
So that is really the direction you need to go in. What you certainly don't want is using "multiple" $lookup stages or even applying $filter conditions on what could potentially be large arrays. Also both forms here do their best to "filter down" the number of items "joined" from the foreign collection so as not to cause a breach of the BSON limit.
Particularly the "pre 3.6" version actually has a trick where $lookup + $unwind + $match occur in succession which you can see in the explain output. All stages actually combine into "one" stage there which solely returns only the items which match the conditions in the $match from the foreign collection. Keeping things "unwound" until we reduce further avoids BSON limit problems, as does the new form with MongoDB 3.6 where the "sub-pipeline" does all the document reduction and grouping before any results are returned.
Your one document sample would return like this:
{
"_id" : ObjectId("5aeb6b71709f43359e0888bb"),
"data" : [
{
"groupDate" : ISODate("2018-05-02T00:00:00Z"),
"assignedCount" : 1,
"resolvedCount" : 0
}
]
}
Once I expand the date selection to include that date, which of course the date selection can also be improved and corrected from your original form.
So it seems to make sense that your relationships are actually defined that way but it's just that you recorded them "twice". You don't need to and even if that's not the definition then you should actually instead record on the "child" rather than an array in the parent. We can juggle and merge the parent arrays, but that's counterproductive to actually establishing the data relations correctly and using them correctly as well.
How about something like this?
db.users.aggregate([
{
$lookup:{ // lookup assigned tickets
from:'tickets',
localField:'assignedTickets',
foreignField:'_id',
as:'assigned',
}
},
{
$lookup:{ // lookup resolved tickets
from:'tickets',
localField:'resolvedTickets',
foreignField:'_id',
as:'resolved',
}
},
{
$project:{
"tickets":{ // merge all tickets into one single array
$concatArrays:[
"$assigned",
"$resolved"
]
}
}
},
{
$unwind:'$tickets' // flatten the 'tickets' array into separate documents
},
{
$group:{ // group by 'createdAt' and 'assignee_id'
_id:{
groupDate:{
$dateFromParts:{
year:{ $year:'$tickets.createdAt' },
month:{ $month:'$tickets.createdAt' },
day:{ $dayOfMonth:'$tickets.createdAt' },
},
},
userId:'$tickets.assignee_id',
},
assignedCount:{ // get the count of assigned tickets
$sum:{
$cond:[
{ // by checking the 'type' field for a value of 'assigned'
$eq:[
'$tickets.type',
'assigned'
]
},
1, // if matching count 1
0 // else 0
]
}
},
resolvedCount:{
$sum:{
$cond:[
{ // by checking the 'type' field for a value of 'resolved'
$eq:[
'$tickets.type',
'resolved'
]
},
1, // if matching count 1
0 // else 0
]
}
},
},
},
{
$sort:{ // sort by 'groupDate' descending
'_id.groupDate':-1
},
},
{
$group:{
_id:'$_id.userId', // group again but only by userId
data:{
$push:{ // create an array
groupDate:'$_id.groupDate',
assignedCount:{
$sum:'$assignedCount'
},
resolvedCount:{
$sum:'$resolvedCount'
}
}
}
}
}
])

Use MongoDB projection for nesting whole documents?

I have a flat collection of documents, where some documents have a parent: ObjectId field, which points another document from the same collection, i.e.:
{id: 1, metadata: {text: "I'm a parent"}}
{id: 2, metadata: {text: "I'm child 1", parent: 1}}
Now I'd like to retrieve all parents where metadata.text = "I'm a parent" plus it's child elements. But I want that data in a nested format, so I can simply process it afterwards without having a look at metadata.parent. The output should look like:
{
id: 1,
metadata: {text: "I'm a parent"},
children: [
{id: 2, metadata: {text: "I'm child 1", parent: 1}}
]
}
(children could also be part of the parent's metadata object if that's easier)
Why don't I save the documents in a nested structure? I don't want to store the data in a nested format in DB, because those documents are part of GridFS.
The main problem is: How can I tell MongoDB to nest a whole document? Or do I have to use Mongo's aggregation framework for that task?
For the sort of "projection" you are asking for then the aggregation framework is the correct tool as this sort of "document re-shaping" is only really supported there.
The other case is the "parent/child" thing, where you again need to be "creative" when grouping using the aggregation framework. The full operations show what is essentially involved:
db.collection.aggregate([
// Group parent and children together with conditionals
{ "$group": {
"_id": { "$ifNull": [ "$metadata.parent", "$_id" ] },
"metadata": {
"$addToSet": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
false,
"$metadata"
]
}
},
"children": {
"$push": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
"$$ROOT",
false
]
}
}
}},
// Filter out "false" values
{ "$project": {
"metadata": { "$setDifference": [ "$metadata", [false] ] },
"children": { "$setDifference": [ "$children", [false] ] }
}},
// metadata is an array but should only have one item
{ "$unwind": "$metadata" },
// This is essentially sorting the children as "sets" are un-ordered
{ "$unwind": "$children" },
{ "$sort": { "_id": 1, "children._id": 1 } },
{ "$group": {
"_id": "$_id",
"metadata": { "$first": "$metadata" },
"children": { "$push": "$children" }
}}
])
The main thing here is the $ifNull operator used on the grouping _id. This will choose to $group on the "parent" field where present, otherwise using the general document _id.
Similar things are done with the $cond operator later where the evaluation is made of which data to add to the array or "set". In the following $project the false values are filtered out by use of the $setDifference operator.
If the final $sort and $group there seem confusing, then the actual reason is because the operator used is a "set" operator the resulting "set" is considered to be un-ordered. So really that part is just there to make sure that the array contents appear in order of their own _id field.
Without the additional operators from MongoDB 2.6 this can still be done, but just a little differently.
db.collection.aggregate([
{ "$group": {
"_id": { "$ifNull": [ "$metadata.parent", "$_id" ] },
"metadata": {
"$addToSet": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
false,
"$metadata"
]
}
},
"children": {
"$push": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
{ "_id": "$_id","metadata": "$metadata" },
false
]
}
}
}},
{ "$unwind": "$metadata" },
{ "$match": { "metadata": { "$ne": false } } },
{ "$unwind": "$children" },
{ "$match": { "children": { "$ne": false } } },
{ "$sort": { "_id": 1, "children._id": 1 } },
{ "$group": {
"_id": "$_id",
"metadata": { "$first": "$metadata" },
"children": { "$push": "$children" }
}}
])
Essentially the same thing but without the newer operators introduced in MongoDB 2.6, so this would work in earlier versions as well.
This will all be fine as long as your relationships are a single level of parent and child. For nested levels you would need to invoke a mapReduce process instead.
I wanted a similar result to Neil Lunn's answer except I wanted to fetch all parents regardless of them having children or not. I also wanted to generalise it to work across any collection that had a single level of nested children.
Here's my query based on Neil Lunn's answer
db.collection.aggregate([
{
$group: {
_id: {
$ifNull: ["$parent", "$_id"]
},
parent: {
$addToSet: {
$cond: [
{
$ifNull: ["$parent", false]
}, false, "$$ROOT"
]
}
},
children: {
$push: {
$cond: [
{
$ifNull: ["$parent", false]
}, "$$ROOT", false
]
}
}
}
}, {
$project: {
parent: {
$setDifference: ["$parent", [false]]
},
children: {
$setDifference: ["$children", [false]]
}
}
}, {
$unwind: "$parent"
}
])
This results in every parent being returned where the parent field contains the whole parent document and the children field returning either an empty array if the parent has no children or an array of child documents.
{
_id: PARENT_ID
parent: PARENT_OBJECT
children: [CHILD_OBJECTS]
}