Multiple joins trough array field - mongodb

noSQL beginner here.
For a current use case we are syncing an external relational DB (dataverse) with a changing schema to an mongoDB instance. Because we have no control over the schema and each entity on his one can change we aren't able to normalize the data leading me to the need for nested lookups from an array.
i have created an sample playground to illustrate an rough example of the data structure: https://mongoplayground.net/p/ObWOC5choPq
I need to return the data in roughly the following format:
{
orderID,
products: [
{
productID,
pictureURL,
},
{
productID,
pictureURL,
}
]
}
I'm able to perform both lookups seperatly but aren't able to return the picture data as part of an product object in the products array. Could anyone point me in the right direction?
Kind regards,
Nomis

Maybe something like this:
db.orders.aggregate([
{
"$lookup": {
"from": "products",
"localField": "products.productID",
"foreignField": "_id",
"as": "products"
}
},
{
$unwind: {
path: "$products",
preserveNullAndEmptyArrays: true
}
},
{
"$lookup": {
"from": "pictures",
"localField": "products.pictureID",
"foreignField": "_id",
"as": "products.pictures"
}
},
{
$unwind: {
path:"$products.pictures",
preserveNullAndEmptyArrays: true
}
},
{
$project: {
_id: 1,
products: {
productID: "$products._id",
pictureURL: "$products.pictures.bloburl"
}
}
},
{
$group: {
_id: "$_id",
products: {
$push: "$products"
}
}
},
{
$project: {
orderID: "$_id",
products: 1,
_id: 0
}
}
])
$lookup orders with products
unwind to convert products to object
$lookup orders with pictures to nested in products
$unwind pictures to object
$project the fields as expected in the final result
$group by order_id.
$project one more time to rename order _id to orders
playground

Related

How do I use a wildcard in my lookup foreignField?

I'm trying to make a lookup, where the foreignField is dynamic:
{
$merge: {
_id: ObjectId('61e56339b528bf009feca149')
}
},
{
$lookup: {
from: 'computer',
localField: '_id',
foreignField: 'configs.?.refId',
as: 'computers'
}
}
I know that the foreignField always starts with configs and ends with refId, but the string between the two is dynamic.
Here is an example of what a document looks like:
'_id': ObjectId('6319bd1540b41d1a35717a16'),
'name': 'MyComputer',
'configs': {
'ybe': {
'refId': ObjectId('61e56339b528bf009feca149')
'name': 'Ybe Config'
},
'test': {
'refId': ObjectId('61f3d7ec47805d1443f14540')
'name': 'TestConfig'
},
...
}
As you can see the configs property contains different objects with different names ('ybe', 'test', etc...). I want to lookup based on the refId inside of all of those objects.
How do I achieve that?
Using dynamic value as a field name is considered an anti-pattern and introduces unnecessary complexity to querying. However, you can achieve your behaviour with $objectToArray by converting the object into array of k-v pairs and perform the $match in a sub-pipeline.
db.coll.aggregate([
{
"$lookup": {
"from": "computer",
"let": {
id: "$_id"
},
"pipeline": [
{
$set: {
configs: {
"$objectToArray": "$configs"
}
}
},
{
"$unwind": "$configs"
},
{
$match: {
$expr: {
$eq: [
"$$id",
"$configs.v.refId"
]
}
}
}
],
"as": "computers"
}
}
])
MongoPlayground

mongoDB group documents after unwind

In my mongodb books collection I have documents that look like:
{
_id: ObjectId(625efa44f1ba751c8275ea51),
contributors:[ObjectId(625efa44f1ba751c8275ea52), ObjectId(625efa44f1ba751c8275ea53)]
//other fields
}
And I want to do a query that returns me documents like:
{
_id: ObjectId(625efa44f1ba751c8275ea51),
contributors:[
{
_id: ObjectId(625efa44f1ba751c8275ea52),
first_name: 'Luigi'
//many other fields
},
{
_id: ObjectId(625efa44f1ba751c8275ea53),
first_name: 'Mario'
//many other fields
},
]
//other fields
}
I did an unwind on contributors and a lookup with my users collection and now I need to group them. I haven't used it before but I did something like:
{
$group:{
_id: '_id'
}
}
But I don't know what to do next in order to preserve all the fields from books and also from users.
Do you have any idea?
If the $lookup result is small enough (<16MB document size limit),
you can simply do a $lookup.
db.books.aggregate([
{
"$lookup": {
"from": "users",
"localField": "contributors",
"foreignField": "_id",
"as": "contributors"
}
}
])
Here is the Mongo Playground for your reference.
If the $lookup result will exceed 16 MB limit, you can still $unwind. Just use $firstto regroup the other fields after the$unwind`
db.books.aggregate([
{
"$lookup": {
"from": "users",
"localField": "contributors",
"foreignField": "_id",
"as": "contributors"
}
},
{
"$unwind": "$contributors"
},
{
"$group": {
"_id": "_id",
bookName: {
$first: "$bookName"
},
"contributors": {
"$push": "$contributors"
}
}
}
])
Here is the Mongo playground for your reference.

Array is reordered when using $lookup

I have this aggregation:
db.getCollection("users").aggregate([
{
"$match": {
"_id": "5a708a38e6a4078bd49f01d5"
}
},
{
"$lookup": {
"from": "user-locations",
"localField": "locations",
"as": "locations",
"foreignField": "_id"
}
}
])
It works well, but there is one small thing that I don't understand and I can't fix.
In the query output, the locations array is reordered by ObjectId and I really need to keep the original order of data.
Here is how the locations array from the users collection looks like
'locations' : [
ObjectId("5b55e9820b720a1a7cd19633"),
ObjectId("5a708a38e6a4078bd49ef13f")
],
And here is the result after the aggregation:
'locations' : [
{
'_id' : ObjectId("5a708a38e6a4078bd49ef13f"),
'name': 'Location 2'
},
{
'_id' : ObjectId("5b55e9820b720a1a7cd19633"),
'name': 'Location 1'
}
],
What am I missing here? I really have no idea how to proceed with this issue.
Could you give me a push?
$lookup does not guarantee order of result documents, you can try a approach to manage natural order of document,
$unwind deconstruct locations array and add auto index number will start from 0,
$lookup with locations
$set to select first element from locations
$sort by index field in ascending order
$group by _id and reconstruct locations array
db.users.aggregate([
{ $match: { _id: "5a708a38e6a4078bd49f01d5" } },
{
$unwind: {
path: "$locations",
includeArrayIndex: "index"
}
},
{
$lookup: {
from: "user-locations",
localField: "locations",
foreignField: "_id",
as: "locations"
}
},
{ $set: { locations: { $arrayElemAt: ["$locations", 0] } } },
{ $sort: { index: 1 } },
{
$group: {
_id: "$_id",
locations: { $push: "$locations" }
}
}
])
Playground
From this closed bug report:
When using $lookup, the order of the documents returned is not guaranteed. The documents are returned in "natural order" - as they are encountered in the database. The only way to get a guaranteed consistent order is to add a $sort stage to the query.
Basically the way any Mongo query/pipeline works is that it returns documents in the order they were matched, meaning the "right" order is not guaranteed especially if there's indes usage involved.
What you should do is add a $sort stage as suggested, like so:
db.collection.aggregate([
{
"$match": {
"_id": "5a708a38e6a4078bd49f01d5"
}
},
{
"$lookup": {
"from": "user-locations",
"let": {
"locations": "$locations"
},
"pipeline": [
{
"$match": {
"$expr": {
"$setIsSubset": [
[
"$_id"
],
"$$locations"
]
}
}
},
{
$sort: {
_id: 1 // any other sort field you want.
}
}
],
"as": "locations",
}
}
])
You can also keep the original $lookup syntax you're using and just $unwind, $sort and then $group to restore the structure.

MongoDB $lookup if the local field exists

I have these entities:
// collectionA
{
key: "value",
ref: SOME-OBJECT-ID
}
// collectionB
{
_id: SOME-OBJECT-ID
key1: "value1"
}
I want that if ref exists in the collectionA entity, it will lookup for it on the collectionB and bring its data.
If the ref key is missing or it doesn't missing but the entity in collectionB is missing I get empty result from all of the aggregate query.
This is the aggregate query:
{ $match },
{
$lookup: {
from: "collectionB",
let: {
ref: "$ref"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$ref"
]
}
}
},
{
$project: {
key1: 1
}
}
],
as: "someData"
}
}
How can I avoid this or add any conditional $lookup?
One way of doing that is adding another match at the beginning to skip from source
To skip from B, you can omit at the end.
{$match:{ ref:{$exists:true}}}
It will consider only ref existing docs.
play
db.A.aggregate([
{
"$match": {
ref: {
$exists: true
}
}
},
{
"$lookup": {
"from": "B",
"localField": "ref",
"foreignField": "_id",
"as": "output"
}
}
])
But you don't need to do this if you don't have specific use case, as it will not impact much.
I have found it. The document was not selected because I have used the $unwind - and it won't return the document if we are trying to do it on an empty array. So this is the fix:
{
$unwind: {
path: "$ref",
preserveNullAndEmptyArrays: true
}
}
Instead of:
{
$unwind: "$ref"
}
I found the preserveNullAndEmptyArrays from this answer How to get all result if unwind field does not exist in mongodb

mongodb 2 level aggregate lookup

I have those collection schemas
Schema.users = {
name : "string",
username : "string",
[...]
}
Schema.rooms = {
name : "string",
hidden: "boolean",
user: "string",
sqmt: "number",
service: "string"
}
Schema.room_price = {
morning : "string",
afternoon: "string",
day: "string",
room:'string'
}
I need to aggregate the users with the rooms and foreach room the specific room prices.
the expected result would be
[{
_id:"xXXXXX",
name:"xyz",
username:"xyz",
rooms:[
{
_id: 1111,
name:'room1',
sqmt: '123x',
service:'ppp',
room_prices: [{morning: 123, afternoon: 321}]
}
]}]
The first part of the aggregate could be
db.collection('users').aggregate([
{$match: cond},
{$lookup: {
from: 'rooms',
let: {"user_id", "$_id"},
pipeline: [{$match:{expr: {$eq: ["$user", "$$user_id"]}}}],
as: "rooms"
}}])
but I can't figure out how to get the room prices within the same aggregate
Presuming that room from the room_prices collection has the matching data from the name of the rooms collection, then that would the expression to match on for the "inner" pipeline of the $lookup expression with yet another $lookup:
db.collection('users').aggregate([
{ $match: cond },
{ $lookup: {
from: 'rooms',
let: { "user_id": "$_id" },
pipeline: [
{ $match:{ $expr: { $eq: ["$user", "$$user_id"] } } },
{ $lookup: {
from: 'room_prices',
let: { 'name': '$name' },
pipeline: [
{ $match: { $expr: { $eq: [ '$room', '$$name'] } } },
{ $project: { _id: 0, morning: 1, afternoon: 1 } }
],
as: 'room_prices'
}}
],
as: "rooms"
}}
])
That's also adding a $project in there to select only the fields you want from the prices. When using the expressive form of $lookup you actually do get to express a "pipeline", which can be any aggregation pipeline combination. This allows for complex manipulation and such "nested lookups".
Note that using mongoose you can also get the collection name from the model object using something like:
from: RoomPrice.collection.name
This is generally future proofing against possible model configuration changes which might possibly change the name of the underlying collection.
You can also do pretty much the same with the "legacy" form of $lookup prior to the sub-pipeline syntax available from MongoDB 3.6 and upwards. It's just a bit more processing and reconstruction:
db.collection('users').aggregate([
{ $match: cond },
// in legacy form
{ $lookup: {
from: 'rooms',
localField: 'user_id',
foreignField: 'user',
as: 'rooms'
}},
// unwind the output array
{ $unwind: '$rooms' },
// lookup for the second collection
{ $lookup: {
from: 'room_prices',
localField: 'name',
foreignField: 'room',
as: 'rooms.room_prices'
}},
// Select array fields with $map
{ $addFields: {
'rooms': {
'room_prices': {
$map: {
input: '$rooms.room_prices',
in: {
morning: '$this.morning',
afternoon: '$this.afternoon'
}
}
}
}
}},
// now group back to 'users' data
{ $group: {
_id: '$_id',
name: { $first: '$name' },
username: { $first: '$username' },
// same for any other fields, then $push 'rooms'
rooms: { $push: '$rooms' }
}}
])
That's a bit more overhead mostly from usage of $unwind and also noting that the "field selection" does actually mean you did return the "whole documents" from room_prices "first", and only after that was complete can you select the fields.
So there are advantages to the newer syntax, but it still could be done with earlier versions if you wanted to.