mongo $unwind and $group - mongodb

I have two collections. One of which I wish to add a reference to the other and have it populated on return.
Here is an example json I am trying to achieve as the result:
{
"title": "Some Title",
"uid": "some-title",
"created_at": "1412159926",
"updated_at": "1412159926",
"id": "1",
"metadata": {
"date": "2016-10-17",
"description": "a description"
},
"tags": [
{
"name": "Tag 1",
"uid": "tag-1"
},
{
"name": "Tag 2",
"uid": "tag-2"
},
{
"name": "Tag 3",
"uid": "tag-3"
}
]
}
Here is the mongo query I have which gets my close, but it nests the original body of the item within the _id object.
db.tracks.aggregate([{
$unwind: "$tags"
}, {
$lookup: {
from: "tags",
localField: "tags",
foreignField: "_id",
as: "tags"
}
}, {
$unwind: "$tags"
}, {
$group: {
"_id": {
"title": "$title",
"uid": "$uid",
"metadata": "$metadata"
},
"tags": {
"$push": "$tags"
}
}
}])
So the result is this:
{
"_id" : {
"title" : "Some Title",
"uid" : "some-title",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
}
},
"tags" : [
{
"_id" : ObjectId("580499d06fe29ce7093fb53a"),
"name" : "Tag 1",
"uid" : "tag-1"
},
{
"_id" : ObjectId("580499d06fe29ce7093fb53b"),
"name" : "Tag 2",
"uid" : "tag-2"
}
]
}
Is there a way to achieve the desired output? Also is there a way to not have to define in the $group all the fields which I wish to return, I would like to return the original Object but with the referenced documents in the tags array.

Since you had initially pivoted your original documents on the tags array field which means the documents will be denormalized, your $group pipeline should
use the _id field as its _id key and access the other fields using the $first or $last operator.
The group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, we have to use an aggregation function in MongoDB as well, so unfortunately there is no other way of not having to define in the $group pipeline all the fields which you wish to return apart from using the $first or $last operator on each field:
db.tracks.aggregate([
{ "$unwind": "$tags" },
{
"$lookup": {
"from": "tags",
"localField": "tags",
"foreignField": "_id",
"as": "resultingArray"
}
},
{ "$unwind": "$resultingArray" },
{
"$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"uid": { "$first": "$uid" },
"created_at": { "$first": "$created_at" },
"updated_at": { "$first": "$updated_at" },
"id": { "$first": "$id" },
"metadata": { "$first": "$metadata" },
"tags": { "$push": "$resultingArray" }
}
}
])
One trick I always use whenever I want to debug a pipeline that's giving unexpected results is to run the aggregation with just the first pipeline operator. If that gives the expected result, add the next.
In the answer above, you'd first try aggregating just the $unwind; if that works, add the $lookup. This can help you narrow down which operator is causing issues. In this case, you could run the pipeline with just the first three steps since you believe the $group is the one causing issues and then inspect the resulting documents from that pipeline:
db.tracks.aggregate([
{ "$unwind": "$tags" },
{
"$lookup": {
"from": "tags",
"localField": "tags",
"foreignField": "_id",
"as": "resultingArray"
}
},
{ "$unwind": "$resultingArray" }
])
which yields the output
/* 1 */
{
"_id" : ObjectId("5804a6c900ce8cbd028523d9"),
"title" : "Some Title",
"uid" : "some-title",
"created_at" : "1412159926",
"updated_at" : "1412159926",
"id" : "1",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
},
"resultingArray" : {
"name" : "Tag 1",
"uid" : "tag-1"
}
}
/* 2 */
{
"_id" : ObjectId("5804a6c900ce8cbd028523d9"),
"title" : "Some Title",
"uid" : "some-title",
"created_at" : "1412159926",
"updated_at" : "1412159926",
"id" : "1",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
},
"resultingArray" : {
"name" : "Tag 2",
"uid" : "tag-2"
}
}
/* 3 */
{
"_id" : ObjectId("5804a6c900ce8cbd028523d9"),
"title" : "Some Title",
"uid" : "some-title",
"created_at" : "1412159926",
"updated_at" : "1412159926",
"id" : "1",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
},
"resultingArray" : {
"name" : "Tag 3",
"uid" : "tag-3"
}
}
From inspection you will see that for each input document, the last pipeline outputs 3 documents where 3 is the number of array elements in the computed field resultingArray and they all have a common _id and the other fields with the exception of the resultingArray field which is different, thus you get your desired results by adding a pipeline that groups the documents by the _id field and subsequently getting the other fields with $first or $last operator, as in the given solution:
db.tracks.aggregate([
{ "$unwind": "$tags" },
{
"$lookup": {
"from": "tags",
"localField": "tags",
"foreignField": "_id",
"as": "resultingArray"
}
},
{ "$unwind": "$resultingArray" },
{
"$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"uid": { "$first": "$uid" },
"created_at": { "$first": "$created_at" },
"updated_at": { "$first": "$updated_at" },
"id": { "$first": "$id" },
"metadata": { "$first": "$metadata" },
"tags": { "$push": "$resultingArray" }
}
}
])

Related

How can I group by a string instead of ObjectId in MongoDB aggregate?

I have two collections and a many-to-one relationship between them:
Product:
"_id" : ObjectId("61cc81c9585946c3b44f24411"),
"name" : "some random name",
"price" : 100,
"description" : "description",
"category_id" : ObjectId("61cc8100585946c3b44f2317d")
Category:
{
"_id" : ObjectId("61cc8100585946c3b44f2317d"),
"description" : "Category description",
"name" : "Electronics"
}
I would like to output the maximum product price for each category:
db.product.aggregate([
{ "$group": {
"_id": "$category_id",
"max": { "$max": "$price"}
}}
])
This works just fine as it prints me the following:
{ "_id" : ObjectId("61cc80fb585946c3b44f697c"), "max" : 62}
{ "_id" : ObjectId("61cc8100585946c3b44f697d"), "max" : 100}
But is there a way to get the "name" from the Category instead of its object id?
I know in SQL you would group by category_name but it does not seem to work here.
As suggested by #prasad, you should make use of $lookup stage after your $group stage.
db.product.aggregate([
{
"$group": {
"_id": "$category_id",
"max": {
"$max": "$price"
}
}
},
{
"$lookup": {
"from": "category",
"localField": "_id",
"foreignField": "_id",
"as": "categoryName",
}
},
{
"$set": {
"categoryName": {
"$arrayElemAt": [
"$categoryName.name",
0
]
}
}
}
])
Mongo Playground Sample

Mongodb aggregate lookup three collections

Learning MongoDB for the past two days and I am trying to aggregate three collections but unable to achieve it
Below are the three collection maintaining in the database
t_credentials
{
"_id" : "619ca68b624c41e408348406",
"title" : "Company ID"
}
t_groups
{
"_id" : "61a253da88ca12a37218898d",
"group_name" : "Gold"
}
t_user_credentials
{
"_id" : "619ca88a624c41e408348424",
"credential_id" : "619ca68b624c41e408348406",
"group_id" : "61a253da88ca12a37218898d",
"identifiers" : {
"first_name" : "Lee",
"middle_name" : "Min",
"last_name" : "Ho"
},
"created_at" : "2021-12-01T17:20:49.000Z"
}
Here I am trying to achieve the output in the below format:
Expected Output
[{
"_id" : "619ca88a624c41e408348424",
"first_name" : ,
"middle_name" : ,
"last_name" : ,
"credential" : {
"_id:" : "619ca68b624c41e408348406",
"title" : "Company ID"
},
"group" : {
"_id" : "61a253da88ca12a37218898d",
"group_name" : "Gold"
},
"created_at" : "2021-12-01T17:20:49.000Z"
}]
But, I am getting the fields only from t_user_credentials but not able to get like in the above format
Query
db.t_user_credentials.aggregate([
{
$lookup: {
from: "t_credentials",
localField: "_id",
foreignField: "credential_id",
as: "credentials"
}
},
{
$unwind: {
path:'$credentials',
preserveNullAndEmptyArrays: true
}
},
{
$lookup: {
from: "t_groups",
localField: "_id",
foreignField: "group_id",
as: "groups"
}
},
{
$unwind: {
path: '$groups',
preserveNullAndEmptyArrays: true
}
},
{
$project: {
last_name: "$identifiers.last_name",
first_name: "$identifiers.first_name",
middle_name: "$identifiers.middle_name",
"credentials.title": 1,
created_at: 1,
group_id: 1
}
}
])
Can any one help me to solve this issue, it will be very helpful for me.
This query uses $replaceWith to merge the identifiers sub-document into the $$ROOT document. We also use $unset to remove fields we are no longer interested in. Before all of that we make sure to unwind our credential and group fields.
You can check out a live demo of this query here
Consider the following:
Database
db={
"t_credentials": [
{
"_id": "619ca68b624c41e408348406",
"title": "Company ID"
}
],
"t_groups": [
{
"_id": "61a253da88ca12a37218898d",
"group_name": "Gold"
}
],
"t_user_credentials": [
{
"_id": "619ca88a624c41e408348424",
"credential_id": "619ca68b624c41e408348406",
"group_id": "61a253da88ca12a37218898d",
"identifiers": {
"first_name": "Lee",
"middle_name": "Min",
"last_name": "Ho"
},
"created_at": "2021-12-01T17:20:49.000Z"
}
]
}
Query
db.t_user_credentials.aggregate([
{
"$lookup": {
"from": "t_credentials",
"localField": "credential_id",
"foreignField": "_id",
"as": "credential"
}
},
{
"$lookup": {
"from": "t_groups",
"localField": "group_id",
"foreignField": "_id",
"as": "group"
}
},
{
$unwind: "$group",
},
{
$unwind: "$credential"
},
{
$replaceWith: {
$mergeObjects: [
"$$ROOT",
"$identifiers"
]
}
},
{
$unset: [
"group_id",
"credential_id",
"identifiers"
]
}
])
Result
[
{
"_id": "619ca88a624c41e408348424",
"created_at": "2021-12-01T17:20:49.000Z",
"credential": {
"_id": "619ca68b624c41e408348406",
"title": "Company ID"
},
"first_name": "Lee",
"group": {
"_id": "61a253da88ca12a37218898d",
"group_name": "Gold"
},
"last_name": "Ho",
"middle_name": "Min"
}
]

Mongodb group and push with empty arrays

I'm having a problem with a group when there is an array that could be empty.
The collection could be like this:
{
"_id" : "Contract_1",
"ContactId" : "Contact_1",
"Specifications" : [ ]
}
{
"_id" : "Contract_2",
"ContactId" : "Contact_2",
"Specifications" : [
{
"Description" : "Descrizione1",
"VehicleId" : "Vehicle_1",
"Customizations" : [
{
"Description" : "Random furniture",
"ContactId" : "Contact_5"
},
{
"Description" : "Random furniture 2",
"ContactId" : "Contact_3"
}
]
},
{
"Description" : "Descrizione2",
"VehicleId" : "Vehicle_2",
"Customizations" : [
{
"Description" : "Random furniture",
"ContactId" : "Contact_5"
},
{
"Description" : "Random furniture 2",
"ContactId" : "Contact_3"
}
]
}
]
}
{
"_id" : "Contract_3",
"ContactId" : "Contact_25",
"Specifications" : [
{
"Description" : "Descrizione1",
"VehicleId" : "Vehicle_1",
"Customizations" : []
},
{
"Description" : "Descrizione2",
"VehicleId" : "Vehicle_2",
"Customizations" : []
}
]
}
As you can see, sometimes Specifications can be null, and also Customizations.
And this is the query that I execute:
db.getCollection("Contract").aggregate([
{ "$lookup": {
"from": "Contact",
"localField": "ContactId",
"foreignField": "_id",
"as": "Contact"
}},
{ "$unwind": {"path":"$Contact", "preserveNullAndEmptyArrays":true }},
{ "$unwind": { "path": "$Specifications", "preserveNullAndEmptyArrays":true }},
{ "$lookup": {
"from": "Vehicle",
"localField": "Specifications.VehicleId",
"foreignField": "_id",
"as": "Specifications.Vehicle"
}},
{ "$unwind": {"path": {"$Specifications.Vehicle","preserveNullAndEmptyArrays":true} },
{ "$unwind": {"path": {"$Specifications.Customizations","preserveNullAndEmptyArrays":true} },
{ "$lookup": {
"from": "Contact",
"localField": "Specifications.Customizations.ContactId",
"foreignField": "_id",
"as": "Specifications.Customizations.Contact"
}},
{ "$unwind": {"path": {"$Specifications.Customizations.Contact","preserveNullAndEmptyArrays":true} },
{ "$group": {
"_id": {
"_id": "$_id",
"Description": "$Specifications.Description"
},
"ContactId": { "$first": "$ContactId" },
"Contact": { "$first": "$Contact" },
"Specifications": {
"$push": "$Specifications.Customizations"
}
}},
{ "$group": {
"_id": "$_id._id",
"ContactId": { "$first": "$ContactId" },
"Contact": { "$first": "$Contact" },
"Specifications": {
"$push": {
"Description": "$_id.Description",
"Customizations": "$Specifications"
}
}
}}
])
}},
{ "$group": {
"_id": "$_id._id",
"ContactId": { "$first": "$ContactId" },
"Contact": { "$first": "$Contact" },
"Specifications": {
"$push": {
"Description": "$_id.Description",
"Customizations": "$Specifications"
}
}
}}
])
Once the query execute, when it's doing the 2 $group it creates a problem, since for the first one when pushing $Specifications.Customizations will create an array with an empty element inside. What I want is that If Specifications is an empty array, will stay so without adding an empty element inside.
This is I can see one of the drawback of the $unwind and $group for the nested arrays. To get rid from this you need to add one more stage $addFields to filter out the empty nested arrays.
Add this at the end of the pipeline
{ "$addFields": {
"Specifications": {
"$filter": {
"input": "$Specifications",
"cond": { "$ne": ["$$this.Description", undefined] }
}
}
}}
For anyone still having the same problem like me and for whom the answer from #Ashh isnt working (I cant figure out the reason why it doesnt work for me). $ifNull instead of $ne is what worked for me, like this:
{ "$addFields": {
"Specifications": {
"$filter": {
"input": "$Specifications",
"cond": { "$ifNull": ["$$this.Description", false] }
}
}
}}

$lookup when localField is nested

MongoDB version 3.4.10 (Application is using Meteor framework)
Objective: Aggregate documents that are referenced by _id into the containing document as required at runtime.
I have Materials, Models, and Catalog collections with the following documents:
Materials
{ "_id" : "cf4KgXw7ZK6ukdzR7", "name" : "parquet_wood_mahogany" }
Models
{
"_id" : "Mwp5eYYZ4GZzvZuoK",
"name" : "top_square_chamfered",
"type" : "top"
}
{
"_id" : "CqhS2m2RcLZ2Bm4eb",
"name" : "skirt_square",
"type" : "skirt"
}
{
"_id" : "dYP22ajALnWBwpBj2",
"name" : "leg_square",
"type" : "leg"
}
Catalog
{
"_id" : "EcRGzPAq79giYKrbY",
...,
"specs" : {
...,
"models" : [
{
"mesh" : "Mwp5eYYZ4GZzvZuoK",
"material" : "cf4KgXw7ZK6ukdzR7"
},
{
"mesh" : "CqhS2m2RcLZ2Bm4eb",
"material" : "cf4KgXw7ZK6ukdzR7"
},
{
"mesh" : "dYP22ajALnWBwpBj2",
"material" : "cf4KgXw7ZK6ukdzR7"
}
]
}
}
Desired returned document format after aggregation:
{
"_id" : "EcRGzPAq79giYKrbY",
...,
"specs" : {
"dimensions" : {
...,
},
"models" : [
{
"mesh" : {
"_id" : "Mwp5eYYZ4GZzvZuoK",
"name" : "top_square_chamfered",
"type" : "top"
},
"material" : {
"_id" : "cf4KgXw7ZK6ukdzR7",
"name" : "parquet_wood_mahogany"
}
},
{
"mesh" : {
"_id" : "CqhS2m2RcLZ2Bm4eb",
"name" : "skirt_square",
"type" : "skirt"
},
"material" : {
"_id" : "cf4KgXw7ZK6ukdzR7",
"name" : "parquet_wood_mahogany"
}
},
{
"mesh" : {
"_id" : "dYP22ajALnWBwpBj2",
"name" : "leg_square",
"type" : "leg"
},
"material" : {
"_id" : "cf4KgXw7ZK6ukdzR7",
"name" : "parquet_wood_mahogany"
}
}
]
}
}
I haven't included any of my query code because it is so far off the mark as to just be noise. I've been trying to use aggregate, with $lookup combinations, but I'm not getting anywhere close to what I'm after. The MongoDB v3.6 pipeline syntax would make this much easier... but I'm at a complete loss in v3.4.
I would like to avoid using multiple database requests to combine this information if at all possible. Any assistance of advice would be greatly appreciated!
EDIT: Working solution -
db.catalog.aggregate([
{ "$lookup": {
"from": 'models',
"localField": "specs.models.mesh",
"foreignField": "_id",
"as": "models.mesh"
}},
{ "$lookup": {
"from": 'materials',
"localField": "specs.models.material",
"foreignField": "_id",
"as": "models.material"
}},
{ "$unwind": "$models.mesh" },
{ "$unwind": "$models.material" },
{ "$group":{
"_id": "$_id",
"title": { "$first": "$title" },
"desc": { "$first": "$desc" },
"thumbnail": { "$first": "$thumbnail" },
"createdBy": { "$first": "$createdBy" },
"createdAt": { "$first": "$createdAt" },
"specs": { "$first": "$specs" },
"models": { "$push": "$models" }
}},
{ "$project": {
"_id": "$_id",
"title": "$title",
"desc": "$desc",
"thumbnail": "$thumbnail",
"createdBy": "$createdBy",
"createdAt": "$createdAt",
"specs.dimensions": "$specs.dimensions",
"specs.models": "$models",
}}
])
You can try below aggregation
db.catalog.aggregate([
{ "$lookup": {
"from": 'models',
"localField": "specs.models.mesh",
"foreignField": "_id",
"as": "models.mesh"
}},
{ "$lookup": {
"from": 'materials',
"localField": "specs.models.material",
"foreignField": "_id",
"as": "models.material"
}},
{ "$unwind": "$models.mesh" },
{ "$unwind": "$models.material" },
{ "$group":{
"_id": "$_id",
"title": { "$first": "$title" },
"desc": { "$first": "$desc" },
"thumbnail": { "$first": "$thumbnail" },
"createdBy": { "$first": "$createdBy" },
"createdAt": { "$first": "$createdAt" },
"specs": { "$first": "$specs" },
"models": { "$push": "$models" }
}},
{ "$project": {
"title": "$title",
"desc": "$desc",
"thumbnail": "$thumbnail",
"createdBy": "$createdBy",
"createdAt": "$createdAt",
"specs.dimensions": "$specs.dimensions",
"specs.models": "$models",
}}
])

MongoDB nested lookup with 3 levels

I need to retrieve the entire single object hierarchy from the database as a JSON. Actually, the proposal about any other solution to achieve this result would be highly appreciated. I decided to use MongoDB with its $lookup support.
So I have three collections:
party
{ "_id" : "2", "name" : "party2" }
{ "_id" : "5", "name" : "party5" }
{ "_id" : "4", "name" : "party4" }
{ "_id" : "1", "name" : "party1" }
{ "_id" : "3", "name" : "party3" }
address
{ "_id" : "a3", "street" : "Address3", "party_id" : "2" }
{ "_id" : "a6", "street" : "Address6", "party_id" : "5" }
{ "_id" : "a1", "street" : "Address1", "party_id" : "1" }
{ "_id" : "a5", "street" : "Address5", "party_id" : "5" }
{ "_id" : "a2", "street" : "Address2", "party_id" : "1" }
{ "_id" : "a4", "street" : "Address4", "party_id" : "3" }
addressComment
{ "_id" : "ac2", "address_id" : "a1", "comment" : "Comment2" }
{ "_id" : "ac1", "address_id" : "a1", "comment" : "Comment1" }
{ "_id" : "ac5", "address_id" : "a5", "comment" : "Comment6" }
{ "_id" : "ac4", "address_id" : "a3", "comment" : "Comment4" }
{ "_id" : "ac3", "address_id" : "a2", "comment" : "Comment3" }
I need to retrieve all parties with all corresponding addresses and address comments as part of the record. My aggregation:
db.party.aggregate([{
$lookup: {
from: "address",
localField: "_id",
foreignField: "party_id",
as: "address"
}
},
{
$unwind: "$address"
},
{
$lookup: {
from: "addressComment",
localField: "address._id",
foreignField: "address_id",
as: "address.addressComment"
}
}])
The result is pretty weird. Some records are ok. But Party with _id: 4 is missing (there is no address for it). Also, there are two Party _id: 1 in the result set (but with different addresses):
{
"_id": "1",
"name": "party1",
"address": {
"_id": "2",
"street": "Address2",
"party_id": "1",
"addressComment": [{
"_id": "3",
"address_id": "2",
"comment": "Comment3"
}]
}
}{
"_id": "1",
"name": "party1",
"address": {
"_id": "1",
"street": "Address1",
"party_id": "1",
"addressComment": [{
"_id": "1",
"address_id": "1",
"comment": "Comment1"
},
{
"_id": "2",
"address_id": "1",
"comment": "Comment2"
}]
}
}{
"_id": "3",
"name": "party3",
"address": {
"_id": "4",
"street": "Address4",
"party_id": "3",
"addressComment": []
}
}{
"_id": "5",
"name": "party5",
"address": {
"_id": "5",
"street": "Address5",
"party_id": "5",
"addressComment": [{
"_id": "5",
"address_id": "5",
"comment": "Comment5"
}]
}
}{
"_id": "2",
"name": "party2",
"address": {
"_id": "3",
"street": "Address3",
"party_id": "2",
"addressComment": [{
"_id": "4",
"address_id": "3",
"comment": "Comment4"
}]
}
}
Please help me with this. I'm pretty new to MongoDB but I feel it can do what I need from it.
The cause of your 'troubles' is the second aggregation stage - { $unwind: "$address" }. It removes record for party with _id: 4 (because its address array is empty, as you mention) and produces two records for parties _id: 1 and _id: 5 (because each of them has two addresses).
To prevent removing of parties without addresses you should set preserveNullAndEmptyArrays option of $unwind stage to true.
To prevent duplicating of parties for its different addresses you should add $group aggregation stage to your pipeline. Also, use $project stage with $filter operator to exclude empty address records in output.
db.party.aggregate([{
$lookup: {
from: "address",
localField: "_id",
foreignField: "party_id",
as: "address"
}
}, {
$unwind: {
path: "$address",
preserveNullAndEmptyArrays: true
}
}, {
$lookup: {
from: "addressComment",
localField: "address._id",
foreignField: "address_id",
as: "address.addressComment",
}
}, {
$group: {
_id : "$_id",
name: { $first: "$name" },
address: { $push: "$address" }
}
}, {
$project: {
_id: 1,
name: 1,
address: {
$filter: { input: "$address", as: "a", cond: { $ifNull: ["$$a._id", false] } }
}
}
}]);
With the mongodb 3.6 and above $lookup syntax it is quite simple to join nested fields without using $unwind.
db.party.aggregate([
{ "$lookup": {
"from": "address",
"let": { "partyId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$party_id", "$$partyId"] }}},
{ "$lookup": {
"from": "addressComment",
"let": { "addressId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$address_id", "$$addressId"] }}}
],
"as": "address"
}}
],
"as": "address"
}},
{ "$unwind": "$address" }
])