MongoDb $lookup query with multiple fields from objects array

MongoDb $lookup query with multiple fields from objects array - mongodb

This question has previously been marked as a duplicate of this question I can with certainty confirm that it is not.
This is not a duplicate of the linked question because the elements in question are not an array but embedded in individual objects of an array as fields. I am fully aware of how the query in the linked question should work, however that scenario is different from mine.
I have a question regarding the $lookup query of MongoDb. My data structure looks as follows:
My "Event" collection contains this single document:
{
"_id": ObjectId("mongodbobjectid..."),
"name": "Some Event",
"attendees": [
{
"type": 1,
"status": 2,
"contact": ObjectId("mongodbobjectidHEX1")
},
{
"type": 7,
"status": 4,
"contact": ObjectId("mongodbobjectidHEX2")
}
]
}
My "Contact" collection contains these documents:
{
"_id": ObjectId("mongodbobjectidHEX1"),
"name": "John Doe",
"age": 35
},
{
"_id": ObjectId("mongodbobjectidHEX2"),
"name": "Peter Pan",
"age": 60
}
What I want to do is perform an aggregate query with the $lookup operator on the "Event" collection and get the following result with full "contact" data:
{
"_id": ObjectId("mongodbobjectid..."),
"name": "Some Event",
"attendees": [
{
"type": 1,
"status": 2,
"contact": {
"_id": ObjectId("mongodbobjectidHEX1"),
"name": "John Doe",
"age": 35
}
},
{
"type": 7,
"status": 4,
"contact": {
"_id": ObjectId("mongodbobjectidHEX2"),
"name": "Peter Pan",
"age": 60
}
}
]
}
I have done the same with single elements of "Contact" referenced in another document but never when embedded in an array. I am unsure of which pipeline arguments to pass to get the above shown result?
I also want to add a $match query to the pipeline to filter the data, but that is not really part of my question.

Try this one
db.getCollection('Event').aggregate([{ "$unwind": "$attendees" },
{ "$lookup" : { "from" : "Contact", "localField" : "attendees.contact", "foreignField": "_id", "as" : "contactlist" } },
{ "$unwind": "$contactlist" },
{ "$project" :{
"attendees.type" : 1,
"attendees.status" : 1,
"attendees.contact" : "$contactlist",
"name": 1, "_id": 1
}
},
{
"$group" : {
_id : "$_id" ,
"name" : { $first : "$name" },
"attendees" : { $push : "$attendees" }
}
}
])

Related

Mongodb $filter on Embedded Documents, return all collection data? [duplicate]

This question already has answers here:
Include all existing fields and add new fields to document
(6 answers)
Closed 4 years ago.
I am using mongodb and I have a document which return a json like that:
{
"_id": "5ad9a24be78f9d33888d2567",
"tag": [],
"active": 1,
"code": "_CAROT",
"name": [
{
"lang": "uk",
"translation": "carot"
},
{
"lang": "fr",
"translation": "carotte"
}
],
"season": [],
"category": [],
"createdAt": "2018-04-23T07:59:51.261Z",
"updatedAt": "2018-04-23T07:59:51.261Z",
"__v": 0
}
I want to add a filter on the lang, to get only one translation. So I am using aggregate and $filter to do that. This is what I do :
db.products.aggregate(
[ {$match: {'name.lang': "fr"}},
{$project: { name: {$filter: {
input: '$name',
as: 'item',
cond: {$eq: ['$$item.lang', "fr"]}
}}
}}
])
And I get :
{ "_id" : ObjectId("5ad9a24be78f9d33888d2567"), "name" : [ { "lang" : "fr", "translation" : "carotte" } ] }
{ "_id" : ObjectId("5add96fedf3aac3d049196ca"), "name" : [ { "lang" : "fr", "translation" : "tomate" } ] }
However I would like to get the following result :
{
"_id": "5ad9a24be78f9d33888d2567",
"tag": [],
"active": 1,
"code": "_CAROT",
"name": [
{
"lang": "fr",
"translation": "carotte"
}
],
"season": [],
"category": [],
"createdAt": "2018-04-23T07:59:51.261Z",
"updatedAt": "2018-04-23T07:59:51.261Z",
"__v": 0
}
Basically the default result with just the "fr" result on the "name" field.
Is there a way to do it using mongoDB ?
Thanks a lot

You can use the aggregation $unwind to "unwrap" by translation. I means that for each translation value of each document it will create a new document with only this translation value (at the same level, not in a sub document). Then you will have to then filter with $match to only keep the "fr" translations.
Note you will have to copy each field name to have it in the final result.
Example:
db.products.aggregate([
{ $unwind: '$name' }, // scalar product by name
{ $match: { 'name.lang': 'fr' } }, // only keep documents in french
{ $project: { _id: 0, code: 1, 'name.translation': 1 } } // return code + translation (in french)
])

MongoDB query in java for array fields using ProjectionOperation

Below is the JSON structure for a Store document:
{
{
"_id":"87348378",
"name": "ABC store",
"type": "Books",
"books": [
{
"name": "love",
"id": "1",
"types":{
"type":"love",
"number":"1"
}
},
{
"name": "coreman",
"id": "2",
"types":{
"type":"love",
"number":"1"
}
}
]
},
{
"_id":"87348",
"name": "Some store",
"type": "Books",
"books": [
{
"name": "JAVA",
"id": "1",
"types":{
"type":"Programming",
"number":"2"
}
},
{
"name": "coreman",
"id": "2",
"types":{
"type":"Programming",
"number":"3"
}
}
]
}
}
I need to get the all the stores, which are of a Bookstore type. But, I just want to return certain fields from the Books array(Name & type from types)
Using the following query to get the result:
db.getCollection('store').aggregate([{
"$project" : { "name" : 1 , "type" : 1 ,
"books":{"name":1,"types":{"type":1}}}
},
{ "$group" : { "_id" : "$type",
"books" : { "$push" : "$$ROOT"}}}])
Could anyone help me out to generate the same query with Spring/Java.
Note: I have seen addInclude method has like below in spring docs.
reference: https://docs.spring.io/spring-data/data-mongo/docs/current/api/org/springframework/data/mongodb/core/aggregation/ProjectionOperation.html
**org.springframework.data.mongodb.core.aggregation**
andInclude
public ProjectionOperation andInclude(String... fieldNames)
Includes the given fields into the projection.
Parameters:
fieldNames - must not be null.
Returns:
andInclude
public ProjectionOperation andInclude(Fields fields)
Includes the given fields into the projection.
Parameters:
fields - must not be null.
Returns:

mongo $unwind and $group

I have two collections. One of which I wish to add a reference to the other and have it populated on return.
Here is an example json I am trying to achieve as the result:
{
"title": "Some Title",
"uid": "some-title",
"created_at": "1412159926",
"updated_at": "1412159926",
"id": "1",
"metadata": {
"date": "2016-10-17",
"description": "a description"
},
"tags": [
{
"name": "Tag 1",
"uid": "tag-1"
},
{
"name": "Tag 2",
"uid": "tag-2"
},
{
"name": "Tag 3",
"uid": "tag-3"
}
]
}
Here is the mongo query I have which gets my close, but it nests the original body of the item within the _id object.
db.tracks.aggregate([{
$unwind: "$tags"
}, {
$lookup: {
from: "tags",
localField: "tags",
foreignField: "_id",
as: "tags"
}
}, {
$unwind: "$tags"
}, {
$group: {
"_id": {
"title": "$title",
"uid": "$uid",
"metadata": "$metadata"
},
"tags": {
"$push": "$tags"
}
}
}])
So the result is this:
{
"_id" : {
"title" : "Some Title",
"uid" : "some-title",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
}
},
"tags" : [
{
"_id" : ObjectId("580499d06fe29ce7093fb53a"),
"name" : "Tag 1",
"uid" : "tag-1"
},
{
"_id" : ObjectId("580499d06fe29ce7093fb53b"),
"name" : "Tag 2",
"uid" : "tag-2"
}
]
}
Is there a way to achieve the desired output? Also is there a way to not have to define in the $group all the fields which I wish to return, I would like to return the original Object but with the referenced documents in the tags array.

Since you had initially pivoted your original documents on the tags array field which means the documents will be denormalized, your $group pipeline should
use the _id field as its _id key and access the other fields using the $first or $last operator.
The group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, we have to use an aggregation function in MongoDB as well, so unfortunately there is no other way of not having to define in the $group pipeline all the fields which you wish to return apart from using the $first or $last operator on each field:
db.tracks.aggregate([
{ "$unwind": "$tags" },
{
"$lookup": {
"from": "tags",
"localField": "tags",
"foreignField": "_id",
"as": "resultingArray"
}
},
{ "$unwind": "$resultingArray" },
{
"$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"uid": { "$first": "$uid" },
"created_at": { "$first": "$created_at" },
"updated_at": { "$first": "$updated_at" },
"id": { "$first": "$id" },
"metadata": { "$first": "$metadata" },
"tags": { "$push": "$resultingArray" }
}
}
])
One trick I always use whenever I want to debug a pipeline that's giving unexpected results is to run the aggregation with just the first pipeline operator. If that gives the expected result, add the next.
In the answer above, you'd first try aggregating just the $unwind; if that works, add the $lookup. This can help you narrow down which operator is causing issues. In this case, you could run the pipeline with just the first three steps since you believe the $group is the one causing issues and then inspect the resulting documents from that pipeline:
db.tracks.aggregate([
{ "$unwind": "$tags" },
{
"$lookup": {
"from": "tags",
"localField": "tags",
"foreignField": "_id",
"as": "resultingArray"
}
},
{ "$unwind": "$resultingArray" }
])
which yields the output
/* 1 */
{
"_id" : ObjectId("5804a6c900ce8cbd028523d9"),
"title" : "Some Title",
"uid" : "some-title",
"created_at" : "1412159926",
"updated_at" : "1412159926",
"id" : "1",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
},
"resultingArray" : {
"name" : "Tag 1",
"uid" : "tag-1"
}
}
/* 2 */
{
"_id" : ObjectId("5804a6c900ce8cbd028523d9"),
"title" : "Some Title",
"uid" : "some-title",
"created_at" : "1412159926",
"updated_at" : "1412159926",
"id" : "1",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
},
"resultingArray" : {
"name" : "Tag 2",
"uid" : "tag-2"
}
}
/* 3 */
{
"_id" : ObjectId("5804a6c900ce8cbd028523d9"),
"title" : "Some Title",
"uid" : "some-title",
"created_at" : "1412159926",
"updated_at" : "1412159926",
"id" : "1",
"metadata" : {
"date" : "2016-10-17",
"description" : "a description"
},
"resultingArray" : {
"name" : "Tag 3",
"uid" : "tag-3"
}
}
From inspection you will see that for each input document, the last pipeline outputs 3 documents where 3 is the number of array elements in the computed field resultingArray and they all have a common _id and the other fields with the exception of the resultingArray field which is different, thus you get your desired results by adding a pipeline that groups the documents by the _id field and subsequently getting the other fields with $first or $last operator, as in the given solution:
db.tracks.aggregate([
{ "$unwind": "$tags" },
{
"$lookup": {
"from": "tags",
"localField": "tags",
"foreignField": "_id",
"as": "resultingArray"
}
},
{ "$unwind": "$resultingArray" },
{
"$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"uid": { "$first": "$uid" },
"created_at": { "$first": "$created_at" },
"updated_at": { "$first": "$updated_at" },
"id": { "$first": "$id" },
"metadata": { "$first": "$metadata" },
"tags": { "$push": "$resultingArray" }
}
}
])

Combining multiple sub-documents into a new doc in mongo

I am trying to query multiple sub-documents in MongoDB and return as a single doc.
I think the aggregation framework is the way to go, but, can't see to get it exactly right.
Take the following docs:
{
"board_id": "1",
"hosts":
[{
"name": "bob",
"ip": "10.1.2.3"
},
{
"name": "tom",
"ip": "10.1.2.4"
}]
}
{
"board_id": "2",
"hosts":
[{
"name": "mickey",
"ip": "10.2.2.3"
},
{
"name": "mouse",
"ip": "10.2.2.4"
}]
}
{
"board_id": "3",
"hosts":
[{
"name": "pavel",
"ip": "10.3.2.3"
},
{
"name": "kenrick",
"ip": "10.3.2.4"
}]
}
Trying to get a query result like this:
{
"hosts":
[{
"name": "bob",
"ip": "10.1.2.3"
},
{
"name": "tom",
"ip": "10.1.2.4"
},
{
"name": "mickey",
"ip": "10.2.2.3"
},
{
"name": "mouse",
"ip": "10.2.2.4"
},
{
"name": "pavel",
"ip": "10.3.2.3"
},
{
"name": "kenrick",
"ip": "10.3.2.4"
}]
}
I've tried this:
db.collection.aggregate([ { $unwind: '$hosts' }, { $project : { name: 1, hosts: 1, _id: 0 }} ])
But it's not quite what I want.

You can definitely do this with aggregate. Let's assume your data is in collection named board, so please replace it with whatever your collection name is.
db.board.aggregate([
{$unwind:"$hosts"},
{$group:{_id:null, hosts:{$addToSet:"$hosts"}}},
{$project:{_id:0, hosts:1}}
]).pretty()
it will return
{
"hosts" : [
{
"name" : "kenrick",
"ip" : "10.3.2.4"
},
{
"name" : "pavel",
"ip" : "10.3.2.3"
},
{
"name" : "mouse",
"ip" : "10.2.2.4"
},
{
"name" : "mickey",
"ip" : "10.2.2.3"
},
{
"name" : "tom",
"ip" : "10.1.2.4"
},
{
"name" : "bob",
"ip" : "10.1.2.3"
}
]
}

So your basic problem here is that the arrays are contained in separate documents. So while you are correct to $unwind the array for processing, in order to bring the content into a single array you would need to $group the result across documents, and $push the content to the result array:
db.collection.aggregate([
{ "$unwind": "$hosts" },
{ "$group": {
"_id": null,
"hosts": { "$push": "$hosts" }
}}
])
So just as $unwind will "deconstruct" the array elements, the $push accumulator in $group brings "reconstructs" the array. And since there is no other key to "group" on, this brings all the elements into a single array.
Note that a null grouping key is only really practical when the resulting document would not exceed the BSON limit. Otherwise you are better off leaving the individual elements as documents in themselves.
Optionally remove the _id with an additional $project if required.

Querying for a Date Range

I have the following data schema:
{
"Address" : "Test1",
"City" : "London",
"Country" : "UK",
"Currency" : "",
"Price_History" : {
"2014-07-04T02:42:58" : [
{
"value1" : 98,
"value2" : 98,
"value3" : 98
}
],
"2014-07-04T03:50:50" : [
{
"value1" : 91,
"value2" : 92,
"value3" : 93
}
]
},
"Location" : [
9.3435,
52.1014
],
"Postal_code" : "xxx"
}
how could generate a query in mongodb to search for all results between "2014-07-04T02:42:58" and "2014-07-04T03:50:50" or how could generate a query to select only results with values from 91 till 93 without to know the date ?
thanks

Not a really good way to model this. A better example would be as follows:
{
"Address" : "Test1",
"City" : "London",
"Country" : "UK",
"Currency" : "",
"Price_History" : [
{ "dateEnrty": 1, "date": ISODate("2014-07-04T02:42:58Z"), "value": 98 },
{ "dateEntry": 2, "date": ISODate("2014-07-04T02:42:58Z"), "value": 98 },
{ "dateEntry": 3, "date": ISODate("2014-07-04T02:42:58Z"), "value": 98 },
{ "dateEntry": 1, "date": ISODate("2014-07-04T03:50:50Z"), "value": 91 },
{ "dateEntry": 2, "date": ISODate("2014-07-04T03:50:50Z"), "value": 92 },
{ "dateEntry": 3, "date": ISODate("2014-07-04T03:50:50Z"), "value": 93 },
],
"Location" : [
9.3435,
52.1014
],
"Postal_code" : "xxx"
}
Or something along those lines that does not utilize the path dependency. Your query here would be relatively simple, but also considering that MongodDB searches documents and not arrays for something like this. But you can dissect with the aggregation framework:
db.collection.aggregate([
// Still match first to reduce the possible documents
{ "$match": {
"Price_History": {
"$elemMatch": {
"date": {
"$gte": ISODate("2014-07-04T02:42:58Z"),
"$lte": ISODate("2014-07-04T03:50:50Z")
},
"value": 98
}
}
}},
// Unwind to "de-normalize"
{ "$unwind": "$Price_History" },
// Match this time to "filter" the array which is now documents
{ "$match": {
"Price_History.date": {
"$gte": ISODate("2014-07-04T02:42:58Z"),
"$lte": ISODate("2014-07-04T03:50:50Z")
},
"Price_Hisotry.value": 98
}},
// Now group back each document with the matches
{ "$group": {
"_id": "$_id",
"Address": { "$first": "$Address" },
"City": { "$first": "$City" },
"Country": { "$first": "$Country" },
"Currency": { "$first": "$Currency" },
"Price_History": { "$push": "$Price_History" },
"Location": { "$first": "$Location" },
"Postal_Code": { "$first": "$Postal_Code" }
}}
])
Or otherwise better off hanging the "normalization" and just go for discrete documents that you can simply process via a standard .find(). Must faster and simpler.
{
"Address" : "Test1",
"City" : "London",
"Country" : "UK",
"Currency" : "",
"date": ISODate("2014-07-04T02:42:58Z"),
"value": 98
}
Etc. So then just query:
db.collection.find({
"date": {
"$gte": ISODate("2014-07-04T02:42:58Z"),
"$lte": ISODate("2014-07-04T03:50:50Z")
},
"value": 98
})
I would really go with that as a "de-normalized" "Price History" collection as it is much more efficient and basically what the aggregation statement is emulating.
The query you ask for is possible using something that evaluates JavaScript like MongoDB mapReduce, but as I have already said, it will need to scan the entire collection without any index assistance, and that is bad.
Take your case to the boss to re-model and earn your bonus now.