MongoDB's aggregation from nested key returns nothing - mongodb

(Edit : this question was edited to better reflect the issue, which might be a little more complicated than the proposed related question.)
Let's say I have these two collections
products
{
_id: 'AAAA',
components: [
{ type: 'foo', items: [
{ itemId: 'item1', qty: 2 },
{ itemId: 'item2', qty: 1 }
] },
{ type: 'bar', items: [
{ itemId: 'item3', qty: 8 }
] }
]
}
items
{
_id: 'item1',
name: 'Foo Item'
}
{
_id: 'item2',
name: 'Bar Item'
}
{
_id: 'item3',
name: 'Buz Item'
}
And that I perform this query
db['products'].aggregate([
{ $lookup: {
from: 'items',
localField: 'components.items.itemId',
foreignField: '_id',
as: 'componentItems'
} }
]);
I get this
{
_id: 'AAAA',
components: [
{ type: 'foo', items: [
{ itemId: 'item1', qty: 2 },
{ itemId: 'item2', qty: 1 }
] }
{ type: 'bar', items: [
{ itemId: 'item3', qty: 8 }
] }
],
componentItems: [ ]
}
Why doesn't the aggregation read the local field value? How can I retrieve the foreign document without losing my original document structure?
Edit
I have read the jira issue and seen the proposed answer, however I don't know how this applies. This is not merely an array, but values from an object, inside an array. I am not sure how I can unwind this, and how to put it back together without losing the document structure.
Edit 2
The problem that I have is that I'm not sure how to group the results back together. With this query :
db['products'].aggregate([
{ $unwind: '$components' },
{ $unwind: '$components.items' },
{ $lookup: {
from: 'items',
localField: 'components.items.itemId',
foreignField: '_id',
as: 'componentsItems'
} }
]);
I get the "correct" result of
{ "_id" : "AAAA", "components" : { "type" : "foo", "items" : { "itemId" : "item1", "qty" : 2 } }, "componentsItems" : [ { "_id" : "item1", "name" : "Foo Item" } ] }
{ "_id" : "AAAA", "components" : { "type" : "foo", "items" : { "itemId" : "item2", "qty" : 1 } }, "componentsItems" : [ { "_id" : "item2", "name" : "Bar Item" } ] }
{ "_id" : "AAAA", "components" : { "type" : "bar", "items" : { "itemId" : "item3", "qty" : 8 } }, "componentsItems" : [ { "_id" : "item3", "name" : "Buz Item" } ] }
But, while I can unwind components.items, I cannot seem to unto this, as $group complains that
"the group aggregate field name 'components.items' cannot be used because $group's field names cannot contain '.'"
db['products'].aggregate([
{ $unwind: '$components' },
{ $unwind: '$components.items' },
{ $lookup: {
from: 'items',
localField: 'components.items.itemId',
foreignField: '_id',
as: 'componentsItems'
} },
{ "$group": {
"components.type": "$components.type",
"components.items": { $push: "$components.items" },
"componentsItems": { $push: "$componentsItems" }
} },
{ "$group": {
"_id": "$_id",
"components": { $push: "$components" },
"componentsItems": { $push: "$componentsItems" }
} }
]);
Edit 3
This query is, thus far, the closest that I found, except that components are not grouped back by type.
db['products'].aggregate([
{ $unwind: '$components' },
{ $unwind: '$components.items' },
{ $lookup: {
from: 'items',
localField: 'components.items.itemId',
foreignField: '_id',
as: 'componentsItems'
} },
{ $unwind: '$componentsItems' },
{ $group: {
"_id": "$_id",
"components": {
$push: {
"type": "$components.type",
"items": "$components.items"
}
},
"componentsItems": { $addToSet: "$componentsItems" }
} }
]);
Also: I am concerned that using $unwind and $group may affect the order of the components, which should be preserved. AFAIK, MongoDB preserve array order when storing documents. I'd hate for this functionality to be broken by the awkwardness of $lookup.

Here is my long and awkward solution :
db['products'].aggregate([
// unwind all... because $lookup cannot work with multi-values
{ $unwind: '$components' },
{ $unwind: '$components.items' },
// lookup... This is a 1:1 relationship but who cares, right?
{ $lookup: {
from: 'items',
localField: 'components.items.itemId',
foreignField: '_id',
as: 'componentsItems'
} },
// our 1:1 relationship is now an array, so this is required
// before grouping, so we don't end up with array of arrays
{ $unwind: '$componentsItems' },
// Group 1: put "components.items" in a temporary array
// and filter duplicates from "componentsItems"
{ $group: {
"_id": {
"i": "$_id",
"t": "$components.type"
},
"items": {
$push: "$components.items"
},
"componentsItems": { $addToSet: "$componentsItems" }
} },
// undo $push...
{ $unwind: "$componentsItems" },
// Group 2: put everything back together
{ $group: {
"_id": "$_id.i",
"items": {
$push: {
"type": "$_id.t",
"items": "$items"
}
},
"componentsItems": { $push: "$componentsItems" }
} }
]);
Edit
A better solution :
db['products'].aggregate([
// Return document, added a collection of "itemId"
{ $project: {
"_id": 1,
"components": 1,
"componentItemId": "$components.items.itemId"
} },
// Since there was two arrays, the field is an array of arrays...
{ $unwind: "$componentItemId" },
{ $unwind: "$componentItemId" },
// make 1:1 lookup...
{ $lookup: {
from: 'items',
localField: 'componentItemId',
foreignField: '_id',
as: 'componentsItems'
} },
// ... extract the 1:1 reference...
{ $unwind: "$componentsItems" },
// group back, ignoring the "componentItemId" field
{ $group: {
"_id": "$_id",
"components": { $first: "$components" },
"componentItems": { $addToSet: "$componentsItems" }
}}
]);
I'm not sure if there is yet a better solution, and I am concerned about performance, but this seems to be the only solutions I can think of.
The downside is that documents cannot be dynamic, and this query will need to be modified whenever the schema changes.
Update
This seems to be resolved in MongoDB 3.3.4 (not release at the time of writing this answer).

Related

MongoDB multiple $lookup and $group output

I'm quite a newbie with MongoDB and I'm trying to retrieve a kind-of leaderboard based on two related collections and a third one, referencing one of the two, based on its different property.
Schema can be found here
Consider a schema like the following one:
tree: { _id, company_id: string, company_name }
link: { _id, company_id: string, url: string }
analytics: { _id, tree_id: string, link_id: string, views: number, clicks: number, date: stringĀ }
A analytics document can have tree_id, views or link_id, clicks at once.
What I'm trying to achieve right now is a kind-of a "leaderboard" of the total clicks + views, starting from analytics collection, joining it with both tree and link, and finally retrieving the sum of clicks and views.
I have already managed to retrieve the sum of them for a specific company_id, with the following code
db.analytics.aggregate([{
$lookup: {
from: "trees",
as: "trees",
localField: "tree_id",
foreignField: "_id"
}
}, {
$lookup: {
from: "links",
as: "links",
localField: "link_id",
foreignField: "_id"
}
}, {
$match: {
$or: [
{"trees.company_id": "1"},
{"links.company_id": "1"}
]
}
}, {
$group: {
_id: null,
views_count: {
$sum: "$views"
},
clicks_count: {
$sum: "$clicks"
}
}
}])
But I can't find a way to get a list of results like
{ company_id: 1, company_name: "foo", clicks: 100, views: 200 },
{ company_id: 2, company_name: "bar", clicks: 200, views: 200 }
and so on.
What I've tried so far is grouping by different _id, which is not working as I would expect
db.analytics.aggregate([{
$lookup: {
from: "trees",
as: "trees",
localField: "tree_id",
foreignField: "_id"
}
}, {
$lookup: {
from: "links",
as: "links",
localField: "link_id",
foreignField: "_id"
}
}, {
$group: {
_id: "$trees.company_id",
views_count: {
$sum: "$views"
},
clicks_count: {
$sum: "$clicks"
}
}
}])
Which does not assign clicks_count to a specific entry, but outputs something like
{ "_id" : [ "1" ], "views_count" : 6, "clicks_count" : 0 }
{ "_id" : [ ], "views_count" : 0, "clicks_count" : 48 }
{ "_id" : [ "2" ], "views_count" : 10, "clicks_count" : 0 }
I'm not even sure that this schema could be the best solution, so I will also appreciate any design suggestions or similar stuff.
Based on the comment below, I tried to deconstruct trees before grouping results, but it ended outputting the company_id, views_count only, without counting clicks, as following
{ "_id" : "2", "views_count" : 10, "clicks_count" : 0 }
{ "_id" : "1", "views_count" : 6, "clicks_count" : 0 }
$addFields to add company field, check condition if trees.company_id not empty [] then return trees otherwise return links
$arrayElemAt to get first element from array
$group by company_id and sum your counts
db.analytics.aggregate([
{ $lookup: { //... } },
{ $lookup: { //... } },
{
$addFields: {
company: {
$arrayElemAt: [
{ $cond: [{ $ne: ["$trees.company_id", []] }, "$trees", "$links"] },
0
]
}
}
},
{
$group: {
_id: "$company.company_id",
company_name: { $first: "$company.company_name" },
views_count: { $sum: "$views" },
clicks_count: { $sum: "$clicks" }
}
}
])
Playground

mongodb how to query referenced property and return primary collection with referenced property joined

Cat Collection:
{
"_id" : ObjectId("5ee8d0d16e4fec1ad4779249"),
"description" : ObjectId("5ea9af047d6a4f6480fd42f4")
}
Description Collection:
{
"_id" : ObjectId("5ea9af047d6a4f6480fd42f4"),
"color" : "ginger"
}
I'd like to obtain all in Cat collection where the Cat.description.color property equals ginger and the Cat.description property is joined;
{
"_id" : ObjectId("5ee8d0d16e4fec1ad4779249"),
"description" : {
"_id" : ObjectId("5ea9af047d6a4f6480fd42f4"),
"color" : "ginger"
}
}
I have the following aggregate query which works, but seems inefficient due to the second $lookup. Given the $match provides us with the necessary Description objects, is there a better way ?
db.Description.aggregate(
{
$match:{
$and:[
{ color: 'ginger' }
]
}
},
{
$lookup: {
from: 'Cat',
let: { descId: '$_id' },
pipeline: [
{
$match: {
$expr: {
$eq: ['$description', '$$descId']
}
}
}
],
as: 'cat'
}
},
{
$unwind: '$cat'
},
{
$replaceRoot: {
newRoot: '$cat'
}
},
{
$lookup: {
from: 'Description',
localField: 'description',
foreignField: '_id',
as: 'description'
}
},
{
$unwind: '$description'
});
You can aggregate on Cat collection and get the expected result
db.cat.aggregate([
{
$lookup: {
from: "description",
let: { catId: "$description" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: [ "$_id", "$$catId" ] },
{ $eq: [ "$color", "ginger" ] }
]
}
}
}
],
as: "description"
}
},
])
Working Mongo playground

Mongodb aggregate lookup for many collections to one nested output

I've 3 Collections:
School
{ "id" : { "$numberLong" : "100000" },
"name" : "School1" }
Faculty
{ "id" : { "$numberLong" : "100000" },
"schoolId" : { "$numberLong" : "100000" },
"name" : "Faculty1" }
Subject
{ "id" : { "$numberLong" : "100000" },
"name" : "Subject1" }
Assume there are many of these in each collection. I want to be able to serve an endpoint that takes in an ID and returns the full 3 layered heirarchy (School->Faculty->Subject). How would I return all this data.
Something like:
{
id: 1,
name: "school1",
faculties: [{
id:1000,
name: "faculty1",
subjects: [
{id: 1, name: "sub1"},
{id: 2, name: "sub2"},
{id: 3, name: "sub3"}
]
}]
}
Ok after ages i actually got the solution, which is a lot simpler than the rabbit hole i'd gone down.
{ $match: {id: 100001}},
{ $lookup:
{
from: 'faculties',
localField: 'id',
foreignField: 'schoolId',
as: 'faculties',
}
},
{ $unwind: {
path: "$faculties",
preserveNullAndEmptyArrays: true
}
},
{ $lookup:
{
from: 'subjects',
localField: 'faculties.id',
foreignField: 'facultyId',
as: 'faculties.subjects',
}
}
Which returns the exact output i wanted. The key is the final lookup returning as: 'faculties.subjects' which puts subjects inside faculties which is the first child of schools.
If you need further nesting, you just need to go as: faculties.subjects.students.names for instance each time you get deeper
db.School.aggregate(
// Pipeline
[
// Stage 1
{
$lookup: // Equality Match
{
from: "Faculty",
localField: "id",
foreignField: "schoolId",
as: "faculties"
}
},
// Stage 2
{
$unwind: {
path: "$faculties",
preserveNullAndEmptyArrays: false // optional
}
},
// Stage 3
{
$lookup: // Equality Match
{
from: "Subject",
localField: "faculties.id",
foreignField: "facultyId",
as: "faculties.subjects"
}
},
// Stage 4
{
$group: {
_id: {
id: '$id',
name: '$name'
},
faculties: {
$addToSet: '$faculties'
}
}
},
// Stage 5
{
$project: {
id: '$_id.id',
name: '$_id.name',
faculties: 1
}
},
]
);

Mongodb - $group with $addToSet and then $lookup

I've got the following query
db.getCollection('transportations').aggregate(
{
$group: {
_id: null,
departure_city_id: { $addToSet: "$departure.city_id" },
departure_station_id: { $addToSet: "$departure.station_id" }
}
}
);
and the result is
{
"_id" : null,
"departure_city_id" : [
ObjectId("5a2f5378334c4442ab5a63ea"),
ObjectId("59dae1efe408157cc1585fea"),
ObjectId("5a5bbfdc35628410f9fdcde9")
],
"departure_station_id" : [
ObjectId("5a2f53d1334c4442ab5a63ee"),
ObjectId("5a2f53c5334c4442ab5a63ed"),
ObjectId("5a5bc13435628410f9fdcdea")
]
}
Now i want to lookup each departure_city_id with the collection "areas" to get the "name" of the area and each departure_station_id with the collection "stations" to get also the "name" of the station
The result could be something like this
{
"_id" : null,
"departure_city_id" : [
{
_id: ObjectId("5a2f5378334c4442ab5a63ea"),
name: "City 1
},
{
_id: ObjectId("59dae1efe408157cc1585fea"),
name: "City 2
},
{
_id: ObjectId("5a5bbfdc35628410f9fdcde9"),
name: "City 3
}
],
"departure_station_id" : [
{
_id: ObjectId("5a2f53d1334c4442ab5a63ee"),
name: "Station 1
},
{
_id: ObjectId("5a2f53c5334c4442ab5a63ed"),
name: "Station 2
},
{
_id: ObjectId("5a5bc13435628410f9fdcdea"),
name: "Station 3
}
]
}
The $lookup aggregation pipeline stage NOW works directly with an array (on 3.3.4 version).
See: lookup between local (multiple)array of values and foreign (single) value
The answer of the question is just:
db.getCollection('transportations').aggregate(
{
$group: {
_id: null,
departure_city_id: { $addToSet: "$departure.city_id" },
departure_station_id: { $addToSet: "$departure.station_id" }
}
},
{
$lookup: {
from: "areas",
localField: "departure_city_id",
foreignField: "_id",
as: "departure_city_id"
}
},
{
$lookup: {
from: "stations",
localField: "departure_station_id",
foreignField: "_id",
as: "departure_station_id"
}
}
)

Aggregate pipeline Match -> Lookup -> Unwind -> Match issue

I am puzzled as to why the code below doesn't work. Can anyone explain, please?
For some context: My goal is to get the score associated with an answer option for a survey database where answers are stored in a separate collection from the questions. The questions collection contains an array of answer options, and these answer options have a score.
Running this query:
db.answers.aggregate([
{
$match: {
userId: "abc",
questionId: ObjectId("598be01d4efd70a81c1c5ad4")
}
},
{
$lookup: {
from: "questions",
localField: "questionId",
foreignField: "_id",
as: "question"
}
},
{
$unwind: "$question"
},
{
$unwind: "$question.options"
},
{
$unwind: "$answers"
}
])
I get:
{
"_id" : ObjectId("598e588e0c5e24452c9ee769"),
"userId" : "abc",
"questionId" : ObjectId("598be01d4efd70a81c1c5ad4"),
"answers" : {
"id" : 20
},
"question" : {
"_id" : ObjectId("598be01d4efd70a81c1c5ad4"),
"options" : {
"id" : 10,
"score" : "12"
}
}
}
{
"_id" : ObjectId("598e588e0c5e24452c9ee769"),
"userId" : "abc",
"questionId" : ObjectId("598be01d4efd70a81c1c5ad4"),
"answers" : {
"id" : 20
},
"question" : {
"_id" : ObjectId("598be01d4efd70a81c1c5ad4"),
"options" : {
"id" : 20,
"score" : "4"
}
}
}
All great. If I now add to the original query a match that's supposed to find the answer option having the same id as the answer (e.g. questions.options.id == answers.id), things don't work as I would expect.
The final pipeline is:
db.answers.aggregate([
{
$match: {
userId: "abc",
questionId: ObjectId("598be01d4efd70a81c1c5ad4")
}
},
{
$lookup: {
from: "questions",
localField: "questionId",
foreignField: "_id",
as: "question"
}
},
{
$unwind: "$question"
},
{
$unwind: "$question.options"
},
{
$unwind: "$answers"
},
{
$match: {
"question.options.id": "$answers.id"
}
},
{
$project: {
_id: 0,
score: "$question.options.score"
}
}
])
This returns an empty result. But if I change the RHS of the $match from "$answers.id" to 20, it returns the expected score: 4. I tried everything I could think of, but couldn't get it to work and can't understand why it doesn't work.
I was able to get it to work with the following pipeline:
{
$match: {
userId: "abc",
questionId: ObjectId("598be01d4efd70a81c1c5ad4")
}
},
{
$lookup: {
from: "questions",
localField: "questionId",
foreignField: "_id",
as: "question"
}
},
{
$unwind: "$question"
},
{
$unwind: "$question.options"
},
{
$unwind: "$answers"
},
{
$addFields: {
areEqual: { $eq: [ "$question.options.id", "$answers.id" ] }
}
},
{
$match: {
areEqual: true
}
},
{
$project: {
_id: 0,
score: "$question.options.score"
}
}
I think the reason it didn't work with a direct match is the fact that questions.options.id doesn't actually reference the intended field... I needed to use $questions.options.id which wouldn't work as a LHS of a $match, hence the need to add an extra helper attribute.