I have a flat collection of documents, where some documents have a parent: ObjectId field, which points another document from the same collection, i.e.:
{id: 1, metadata: {text: "I'm a parent"}}
{id: 2, metadata: {text: "I'm child 1", parent: 1}}
Now I'd like to retrieve all parents where metadata.text = "I'm a parent" plus it's child elements. But I want that data in a nested format, so I can simply process it afterwards without having a look at metadata.parent. The output should look like:
{
id: 1,
metadata: {text: "I'm a parent"},
children: [
{id: 2, metadata: {text: "I'm child 1", parent: 1}}
]
}
(children could also be part of the parent's metadata object if that's easier)
Why don't I save the documents in a nested structure? I don't want to store the data in a nested format in DB, because those documents are part of GridFS.
The main problem is: How can I tell MongoDB to nest a whole document? Or do I have to use Mongo's aggregation framework for that task?
For the sort of "projection" you are asking for then the aggregation framework is the correct tool as this sort of "document re-shaping" is only really supported there.
The other case is the "parent/child" thing, where you again need to be "creative" when grouping using the aggregation framework. The full operations show what is essentially involved:
db.collection.aggregate([
// Group parent and children together with conditionals
{ "$group": {
"_id": { "$ifNull": [ "$metadata.parent", "$_id" ] },
"metadata": {
"$addToSet": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
false,
"$metadata"
]
}
},
"children": {
"$push": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
"$$ROOT",
false
]
}
}
}},
// Filter out "false" values
{ "$project": {
"metadata": { "$setDifference": [ "$metadata", [false] ] },
"children": { "$setDifference": [ "$children", [false] ] }
}},
// metadata is an array but should only have one item
{ "$unwind": "$metadata" },
// This is essentially sorting the children as "sets" are un-ordered
{ "$unwind": "$children" },
{ "$sort": { "_id": 1, "children._id": 1 } },
{ "$group": {
"_id": "$_id",
"metadata": { "$first": "$metadata" },
"children": { "$push": "$children" }
}}
])
The main thing here is the $ifNull operator used on the grouping _id. This will choose to $group on the "parent" field where present, otherwise using the general document _id.
Similar things are done with the $cond operator later where the evaluation is made of which data to add to the array or "set". In the following $project the false values are filtered out by use of the $setDifference operator.
If the final $sort and $group there seem confusing, then the actual reason is because the operator used is a "set" operator the resulting "set" is considered to be un-ordered. So really that part is just there to make sure that the array contents appear in order of their own _id field.
Without the additional operators from MongoDB 2.6 this can still be done, but just a little differently.
db.collection.aggregate([
{ "$group": {
"_id": { "$ifNull": [ "$metadata.parent", "$_id" ] },
"metadata": {
"$addToSet": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
false,
"$metadata"
]
}
},
"children": {
"$push": {
"$cond": [
{ "$ifNull": [ "$metadata.parent", false ] },
{ "_id": "$_id","metadata": "$metadata" },
false
]
}
}
}},
{ "$unwind": "$metadata" },
{ "$match": { "metadata": { "$ne": false } } },
{ "$unwind": "$children" },
{ "$match": { "children": { "$ne": false } } },
{ "$sort": { "_id": 1, "children._id": 1 } },
{ "$group": {
"_id": "$_id",
"metadata": { "$first": "$metadata" },
"children": { "$push": "$children" }
}}
])
Essentially the same thing but without the newer operators introduced in MongoDB 2.6, so this would work in earlier versions as well.
This will all be fine as long as your relationships are a single level of parent and child. For nested levels you would need to invoke a mapReduce process instead.
I wanted a similar result to Neil Lunn's answer except I wanted to fetch all parents regardless of them having children or not. I also wanted to generalise it to work across any collection that had a single level of nested children.
Here's my query based on Neil Lunn's answer
db.collection.aggregate([
{
$group: {
_id: {
$ifNull: ["$parent", "$_id"]
},
parent: {
$addToSet: {
$cond: [
{
$ifNull: ["$parent", false]
}, false, "$$ROOT"
]
}
},
children: {
$push: {
$cond: [
{
$ifNull: ["$parent", false]
}, "$$ROOT", false
]
}
}
}
}, {
$project: {
parent: {
$setDifference: ["$parent", [false]]
},
children: {
$setDifference: ["$children", [false]]
}
}
}, {
$unwind: "$parent"
}
])
This results in every parent being returned where the parent field contains the whole parent document and the children field returning either an empty array if the parent has no children or an array of child documents.
{
_id: PARENT_ID
parent: PARENT_OBJECT
children: [CHILD_OBJECTS]
}
Related
I want to filter 2 collections and return one document.
I have 2 MongoDB collections modelled as such
Analytics_Region
_id:5ecf3445365eca3e58ff57c0,
type:"city"
name:"Toronto"
CSD:"3520005"
CSDTYPE:"C"
PR:"35"
PRNAME:"Ontario"
geometry:Object
country:"CAN"
updatedAt:2021-04-23T18:25:50.774+00:00
province:"ON"
Analytics_Region_Custom
_id:5ecbe871d8ab4ab6845c5142
geometry:Object
name:"henry12"
user:5cbdd019b9d9170007d15990
__v:0
I want to output a single collection in alphabetical order by name,
{
_id: 5ecbe871d8ab4ab6845c5142,
name: "henry12",
type: "custom",
province: null
},
{
_id:5ecf3445365eca3e58ff57c0,
name:"Toronto"
type:"city"
province:"ON",
}
Things to note: In the output, we have added a type of "custom" for every document in Analytics_Region_custom. We also add a province of "null" for every document.
So far I looked into $lookup (to fetch results from another collection) but it does not seem to work for my needs since it adds an array onto every document
You can use $unionWith
Documents will be added to the pipeline(no check for duplicates), and from those documents we will project the fields
if type is missing => custom
if province missing => null
*if those 2 have any false value, like false/0/null the old value is kept (new value only if field is missing)
Test code here
db.coll1.aggregate([
{
"$unionWith": {
"coll": "coll2"
}
},
{
"$project": {
"_id": "$_id",
"name": "$name",
"type": {
"$cond": [
{
"$ne": [
{
"$type": "$type"
},
"missing"
]
},
"$type",
"custom"
]
},
"province": {
"$cond": [
{
"$ne": [
{
"$type": "$province"
},
"missing"
]
},
"$province",
null
]
}
}
},
{
"$sort": {
"name": 1
}
}
])
$unionWith to perform union of both collections
$project to project only fields that you want
sort to sort by name field
db.orders.aggregate([
{
$unionWith: "inventory"
},
{
$project: {
_id: 1,
name: 1,
province: { $cond: { if: "$province", then: "$province", else: null } },
type: { $cond: { if: "$type", then: "$type", else: "custom" } }
}
},
{
$sort: { name: 1 }
}
])
Working example
I have documents like:
{
"from":"abc#sss.ddd",
"to" :"ssd#dff.dff",
"email": "Hi hello"
}
How can we calculate count of sum "from and to" or "to and from"?
Like communication counts between two people?
I am able to calculate one way sum. I want to have sum both ways.
db.test.aggregate([
{ $group: {
"_id":{ "from": "$from", "to":"$to"},
"count":{$sum:1}
}
},
{
"$sort" :{"count":-1}
}
])
Since you need to calculate number of emails exchanged between 2 addresses, it would be fair to project a unified between field as following:
db.a.aggregate([
{ $match: {
to: { $exists: true },
from: { $exists: true },
email: { $exists: true }
}},
{ $project: {
between: { $cond: {
if: { $lte: [ { $strcasecmp: [ "$to", "$from" ] }, 0 ] },
then: [ { $toLower: "$to" }, { $toLower: "$from" } ],
else: [ { $toLower: "$from" }, { $toLower: "$to" } ] }
}
}},
{ $group: {
"_id": "$between",
"count": { $sum: 1 }
}},
{ $sort :{ count: -1 } }
])
Unification logic should be quite clear from the example: it is an alphabetically sorted array of both emails. The $match and $toLower parts are optional if you trust your data.
Documentation for operators used in the example:
$match
$exists
$project
$cond
$lte
$strcasecmp
$toLower
$group
$sum
$sort
You basically need to consider the _id for grouping as an "array" of the possible "to" and "from" values, and then of course "sort" them, so that in every document the combination is always in the same order.
Just as a side note, I want to add that "typically" when I am dealing with messaging systems like this, the "to" and "from" sender/recipients are usually both arrays to begin with anyway, so it usally forms the base of where different variations on this statement come from.
First, the most optimal MongoDB 3.2 statement, for single addresses
db.collection.aggregate([
// Join in array
{ "$project": {
"people": [ "$to", "$from" ],
}},
// Unwind array
{ "$unwind": "$people" },
// Sort array
{ "$sort": { "_id": 1, "people": 1 } },
// Group document
{ "$group": {
"_id": "$_id",
"people": { "$push": "$people" }
}},
// Group people and count
{ "$group": {
"_id": "$people",
"count": { "$sum": 1 }
}}
]);
Thats the basics, and now the only variations are in construction of the "people" array ( stage 1 only above ).
MongoDB 3.x and 2.6.x - Arrays
{ "$project": {
"people": { "$setUnion": [ "$to", "$from" ] }
}}
MongoDB 3.x and 2.6.x - Fields to array
{ "$project": {
"people": {
"$map": {
"input": ["A","B"],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "A", "$$el" ] },
"$to",
"$from"
]
}
}
}
}}
MongoDB 2.4.x and 2.2.x - from fields
{ "$project": {
"to": 1,
"from": 1,
"type": { "$const": [ "A", "B" ] }
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": "$_id",
"people": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$to",
"$from"
]
}
}
}}
But in all cases:
Get all recipients into a distinct array.
Order the array to a consistent order
Group on the "always in the same order" list of recipients.
Follow that and you cannot go wrong.
$push is aggregating nulls if the field is not present.
I would like to avoid this.
Is there a way to make a sub expression for $push operator in such way that null values will be skipped and not pushed into the resulting array ?
Bit late to the party, but..
I wanted to do the same thing, and found that I could accomplish it with an expression like this:
// Pushes events only if they have the value 'A'
"events": {
"$push": {
"$cond": [
{
"$eq": [
"$event",
"A"
]
},
"A",
"$noval"
]
}
}
The thinking here is that when you do
{ "$push": "$event" }
then it seems to only push non-null values.
So I made up a column that doesn't exist, $noval, to be returned as the false condition of my $cond.
It seems to work. I'm not sure if it is non-standard and therefore susceptible to breaking one day but..
It's really not completely clear what your specific case is without an example. There is the $ifNull operator which can "replace" a null value or missing field with "something else", but to truly "skip" is not possible.
That said, you can always "filter" the results depending on your actual use case.
If your resulting data is actually a "Set" and you have a MongoDB version that is 2.6 or greater then you can use $setDifference with some help from $addToSet to reduce the number of null values that are kept initially:
db.collection.aggregate([
{ "$group": {
"_id": "$key",
"list": { "$addToSet": "$field" }
}},
{ "$project": {
"list": { "$setDifference": [ "$list", [null] ] }
}}
])
So there would only be one null and then the $setDifference operation will "filter" that out in the comparison.
In earlier versions or when the values are not in fact "unique" and not a "set", then you "filter" by processing with $unwind and $match:
db.collection.aggregate([
{ "$group": {
"_id": "$key",
"list": { "$push": "$field" }
}},
{ "$unwind": "$list" },
{ "$match": { "list": { "$ne": null } }},
{ "$group": {
"_id": "$_id",
"list": { "$push": "$list" }
}}
])
If you don't want to be "destructive" of arrays that would end up "empty" because they contained "nothing but" null, then you keep a count use $ifNull and match on the conditions:
db.collection.aggregate([
{ "$group": {
"_id": "$key",
"list": { "$push": "$field" },
"count": {
"$sum": {
"$cond": [
{ "$eq": { "$ifNull": [ "$field", null ] }, null },
0,
1
]
}
}
}},
{ "$unwind": "$list" },
{ "$match": {
"$or": [
{ "list": { "$ne": null } },
{ "count": 0 }
]
}},
{ "$group": {
"_id": "$_id",
"list": { "$push": "$list" }
}},
{ "$project": {
"list": {
"$cond": [
{ "$eq": [ "$count", 0 ] },
{ "$const": [] },
"$list"
]
}
}}
])
With a final $project replacing any array that simply consisted of null values only with an empty array object.
I have following json structure in mongo collection-
{
"students":[
{
"name":"ABC",
"fee":1233
},
{
"name":"PQR",
"fee":345
}
],
"studentDept":[
{
"name":"ABC",
"dept":"A"
},
{
"name":"XYZ",
"dept":"X"
}
]
},
{
"students":[
{
"name":"XYZ",
"fee":133
},
{
"name":"LMN",
"fee":56
}
],
"studentDept":[
{
"name":"XYZ",
"dept":"X"
},
{
"name":"LMN",
"dept":"Y"
},
{
"name":"ABC",
"dept":"P"
}
]
}
Now I want to calculate following output.
if students.name = studentDept.name
so my result should be as below
{
"name":"ABC",
"fee":1233,
"dept":"A",
},
{
"name":"XYZ",
"fee":133,
"dept":"X"
}
{
"name":"LMN",
"fee":56,
"dept":"Y"
}
Do I need to use mongo aggregation or is it possible to get above given output without using aggregation???
What you are really asking here is how to make MongoDB return something that is actually quite different from the form in which you store it in your collection. The standard query operations do allow a "limitted" form of "projection", but even as the title on the page shared in that link suggests, this is really only about "limiting" the fields to display in results based on what is present in your document already.
So any form of "alteration" requires some form of aggregation, which with both the aggregate and mapReduce operations allow to "re-shape" the document results into a form that is different from the input. Perhaps also the main thing people miss with the aggregation framework in particular, is that it is not just all about "aggregating", and in fact the "re-shaping" concept is core to it's implementation.
So in order to get results how you want, you can take an approach like this, which should be suitable for most cases:
db.collection.aggregate([
{ "$unwind": "$students" },
{ "$unwind": "$studentDept" },
{ "$group": {
"_id": "$students.name",
"tfee": { "$first": "$students.fee" },
"tdept": {
"$min": {
"$cond": [
{ "$eq": [
"$students.name",
"$studentDept.name"
]},
"$studentDept.dept",
false
]
}
}
}},
{ "$match": { "tdept": { "$ne": false } } },
{ "$sort": { "_id": 1 } },
{ "$project": {
"_id": 0,
"name": "$_id",
"fee": "$tfee",
"dept": "$tdept"
}}
])
Or alternately just "filter out" the cases where the two "name" fields do not match and then just project the content with the fields you want, if crossing content between documents is not important to you:
db.collection.aggregate([
{ "$unwind": "$students" },
{ "$unwind": "$studentDept" },
{ "$project": {
"_id": 0,
"name": "$students.name",
"fee": "$students.fee",
"dept": "$studentDept.dept",
"same": { "$eq": [ "$students.name", "$studentDept.name" ] }
}},
{ "$match": { "same": true } },
{ "$project": {
"name": 1,
"fee": 1,
"dept": 1
}}
])
From MongoDB 2.6 and upwards you can even do the same thing "inline" to the document between the two arrays. You still want to reshape that array content in your final output though, but possible done a little faster:
db.collection.aggregate([
// Compares entries in each array within the document
{ "$project": {
"students": {
"$map": {
"input": "$students",
"as": "stu",
"in": {
"$setDifference": [
{ "$map": {
"input": "$studentDept",
"as": "dept",
"in": {
"$cond": [
{ "$eq": [ "$$stu.name", "$$dept.name" ] },
{
"name": "$$stu.name",
"fee": "$$stu.fee",
"dept": "$$dept.dept"
},
false
]
}
}},
[false]
]
}
}
}
}},
// Students is now an array of arrays. So unwind it twice
{ "$unwind": "$students" },
{ "$unwind": "$students" },
// Rename the fields and exclude
{ "$project": {
"_id": 0,
"name": "$students.name",
"fee": "$students.fee",
"dept": "$students.dept"
}},
])
So where you want to essentially "alter" the structure of the output then you need to use one of the aggregation tools to do. And you can, even if you are not really aggregating anything.
When querying mongodb, is it possible to process ("project") the result so as to perform array concatenation?
I actually have 2 different scenarios:
(1) Arrays from different fields:, e.g:
Given:
{companyName:'microsoft', managers:['ariel', 'bella'], employees:['charlie', 'don']}
{companyName:'oracle', managers:['elena', 'frank'], employees:['george', 'hugh']}
I'd like my query to return each company with its 'managers' and 'employees' concatenated:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
(2) Nested arrays:, e.g.:
Given the following docs, where employees are separated into nested arrays (never mind why, it's a long story):
{companyName:'microsoft', personnel:[ ['ariel', 'bella'], ['charlie', 'don']}
{companyName:'oracle', personnel:[ ['elena', 'frank'], ['george', 'hugh']}
I'd like my query to return each company with a flattened 'personal' array:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
I'd appreciate any ideas, using either 'find' or 'aggregate'
Thanks a lot :)
Of Course in Modern MongoDB releases we can simply use $concatArrays here:
db.collection.aggregate([
{ "$project": {
"companyNanme": 1,
"allPersonnel": { "$concatArrays": [ "$managers", "$employees" ] }
}}
])
Or for the second form with nested arrays, using $reduce in combination:
db.collection.aggregate([
{ "$project": {
"companyName": 1,
"allEmployees": {
"$reduce": {
"input": "$personnel",
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
}
}}
])
There is the $setUnion operator available to the aggregation framework. The constraint here is that these are "sets" and all the members are actually "unique" as a "set" requires:
db.collection.aggregate([
{ "$project": {
"companyname": 1,
"allPersonnel": { "$setUnion": [ "$managers", "$employees" ] }
}}
])
So that is cool, as long as all are "unique" and you are in singular arrays.
In the alternate case you can always process with $unwind and $group. The personnel nested array is a simple double unwind
db.collection.aggregate([
{ "$unwind": "$personnel" },
{ "$unwind": "$personnel" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": { "$push": { "$personnel" } }
}}
])
Or the same thing as the first one for versions earlier than MongoDB 2.6 where the "set operators" did not exist:
db.collection.aggregate([
{ "$project": {
"type": { "$const": [ "M", "E" ] },
"companyName": 1,
"managers": 1,
"employees": 1
}},
{ "$unwind": "$type" },
{ "$unwind": "$managers" },
{ "$unwind": "$employees" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "M" ] },
"$managers",
"$employees"
]
}
}
}}
])