I want to join two mongodb collections, collectionA and collectionB.
For each document in collectionA I want to check if exists a coincidence in collectionB.
If I do it in a $lookup, it returns all the documents joined, but I would like the search in collectionB stops as soon as one coincidence is found (kind of a mongodb findOne). My concern is the performance, I know I could get just the element 0 from the array.
Is there a way to do it using the mongodB aggregation framework?
Example:
collectionA:
[
{
"_id": 1,
"item": "almonds"
},
{
"_id": 2,
"item": "pecans"
}
]
colectionB:
[
{
"_fid": 1,
"date": "2021-01-10"
},
{
"_fid": 1,
"date": "2021-01-11"
},
{
"_fid": 1,
"date": "2021-01-12"
},
{
"_fid": 2,
"date": "2021-01-03"
}
]
$lookup mongoDb
db.colectionA.aggregate([
{
"$lookup": {
"from": "colectionB",
"localField": "_id",
"foreignField": "_fid",
"as": "matches"
}
}
])
Result
[
{
"_id": 1,
"item": "almonds",
"matches": [
/* I don't want this array, with 1 element would be enough */
{
"_fid": 1,
"_id": ObjectId("5a934e000102030405000002"),
"date": "2021-01-10"
},
{
"_fid": 1,
"_id": ObjectId("5a934e000102030405000003"),
"date": "2021-01-11"
},
{
"_fid": 1,
"_id": ObjectId("5a934e000102030405000004"),
"date": "2021-01-12"
}
]
},
{
"_id": 2,
"item": "pecans",
"matches": [
{
"_fid": 2,
"_id": ObjectId("5a934e000102030405000005"),
"date": "2021-01-03"
}
]
}
]
You can test on this mongo playground.
Thanks in advance
If you're using at least MongoDB 3.6, you can execute an aggregation pipeline on a joined collection. It might look like this:
db.colectionA.aggregate([
{
"$lookup": {
"from": "colectionB",
"as": "matches",
"let": {
"fid": "$_id"
},
"pipeline": [
{
"$match": {
"$expr": {
"$eq": [
"$_fid",
"$$fid"
]
}
}
},
{
"$limit": 1
}
]
}
}
])
Working Mongo playground
Related
I am having a collection which contains the data like the following and want to have the desirable output which I have mentioned below.
db={
collectionA: [
{
"id": ObjectId("63b7c24c06ebe7a8fd11777b"),
"uniqueRefId": "UUID-2023-0001",
"products": [
{
"productIndex": 1,
"productCategory": ObjectId("63b7c24c06ebe7a8fd11777b"),
"productOwners": [
ObjectId("63b7c2fd06ebe7a8fd117781")
]
},
{
"productIndex": 2,
"productCategory": ObjectId("63b7c24c06ebe7a8fd11777b"),
"productOwners": [
ObjectId("63b7c2fd06ebe7a8fd117781"),
ObjectId("63b7c12706ebe7a8fd117778")
]
},
{
"productIndex": 3,
"productCategory": "",
"productOwners": ""
}
]
}
],
collectionB: [
{
"_id": ObjectId("63b7c2fd06ebe7a8fd117781"),
"fullname": "Jim Corbett",
"email": "jim.corbett#pp.com"
},
{
"_id": ObjectId("63b7c12706ebe7a8fd117778"),
"fullname": "Carry Minatti",
"email": "carry.minatty#pp.com"
},
]
}
Desirable Output = [
{
"id": ObjectId("507f1f77bcf86cd799439011"),
"uniqueRefId": "UUID-2023-0001",
"products": [
{
"productIndex": 1,
"productCategory": ObjectId('614g2f77bff86cd755439021'),
"productOwners": [
{
"_id": ObjectId("63ac1e59c0afb8b6f2d41acd"),
"fullname": "Jim Corbett",
"email": "jim.corbett#pp.com"
}
]
},
{
"productIndex": 2,
"productCategory": ObjectId('614g2f77bff86cd755439021'),
"productOwners": [
{
"_id": ObjectId("63ac1e59c0afb8b6f2d41acd"),
"fullname": "Jim Corbett",
"email": "jim.corbett#pp.com"
},
{
"_id": ObjectId("63ac1e59c0afb8b6f2d41ace"),
"fullname": "Carry Minatti",
"email": "carry.minatty#pp.com"
}
]
},
{
"productIndex": 3,
"productCategory": "",
"productOwners": ""
}
]
}
]
In the collectionA we are having other documents as well, its not just one document.
Similarly for collectionB we are having other documents too.
How we can get this desirable output?
I am expecting the mongodb query for getting this solution.
I have implemented the lookup like the following
db.collectionA.aggregate([
{
"$lookup": {
"from": "collectionB",
"localField": "products.productOwners",
"foreignField": "_id",
"as": "inventory_docs"
}
}
])
You can try this:
db.collectionA.aggregate([
{
"$unwind": "$products"
},
{
"$lookup": {
"from": "collectionB",
"localField": "products.productOwners",
"foreignField": "_id",
"as": "products.productOwners"
}
},
{
"$group": {
"_id": {
id: "$id",
uniqueRefId: "$uniqueRefId"
},
"products": {
"$push": "$products"
}
}
},
{
"$project": {
id: "$_id.id",
uniqueRefId: "$_id.uniqueRefId",
products: 1,
_id: 0
}
}
])
Playground link.
In this query, we do the following:
First we unwind the products array, using $unwind.
Then we calculate productOwners, using $lookup.
Then we group the unwinded elements, using $group.
Finally we, project the desired output using $project.
Collection A:
[{
"_id": 1,
"operation":"SEC",
"name":"x"
},{
"_id": 2,
"operation": "SEC",
"name": "y"
},
{
"_id": 3,
"operation": "SEC",
"name": "z"
}]
Collection B:
[{
"user": 1,
"operation":"SEC",
"name":"x",
"date": "2022-10-25"
},{
"user": 2,
"operation":"SEC",
"name":"y",
"date": "2022-10-25"
}
]
Expected output:
[
{
"_id": 3,
"operation": "SEC",
"name": "z"
}
]
I have two collections and I want to match from the first collection to the second collection by date and want to get only those that are not in the second collection.
You can use the following aggregation pipeline in order to achieve your desired outpu:
[
{
"$lookup": {
"from": "collectionB",
"localField": "_id",
"foreignField": "user",
"as": "collectionB"
}
},
{
$match: {
collectionB: {
$size: 0
}
}
},
{
$project: {
collectionB: 0
}
}
]
Please note that this is an efficient solution. You probably should add a $match step at the beginning in order to limit your results.
Using the mongoDb Aggregation framework; suppose I wanted to $lookup a set of results in another collection with a condition that if returned no results - would then return the results of another condition. This is what I have.
srp collection
[
{
"_id": ObjectId("5fb6727790f41fef3ee7db87"),
"dates_e": {
"from": ISODate("2021-10-01"),
"to": ISODate("2021-10-03")
}
},
{
"_id": ObjectId("5f034bfa4c0000abdfc7df2e"),
"dates_e": {
"from": ISODate("2021-03-10"),
"to": ISODate("2021-03-15")
}
},
..
]
uth collection
[
{
"_id": ObjectId("5fb6727790f41fef3ee7db88"),
"dateTime": ISODate("2021-10-01"),
"res": 1.7
},
{
"_id": ObjectId("5fb6727790f41fef3ee7db89"),
"dateTime": ISODate("2021-10-02"),
"res": 0.5
},
..
]
The aggregation query (on the srp collection):
[
{
$match: { "_id": ObjectId("5fb6727790f41fef3ee7db87") }
},
{
$lookup: {
"from": "uth",
"let": {
"fromDate": "$dates_e.from",
"toDate": "$dates_e.to"
},
"pipeline": [
$match: {
$expr: {
$and: [
{
$gte: ["$varData_e.dateTime", "$$fromDate"]
},
{
$lt: ["$varData_e.dateTime", "$$toDate"]
}
]
}
}
],
"as": "uth_e"
}
}
]
Which would return:
[
{
"_id": ObjectId("5fb6727790f41fef3ee7db87"),
"dates_e": {
"from": ISODate("2021-10-01"),
"to": ISODate("2021-10-03")
},
"uth_e": [
{
"_id": ObjectId("5fb6727790f41fef3ee7db88"),
"dateTime": ISODate("2021-10-01"),
"res": 1.7
},
{
"_id": ObjectId("5fb6727790f41fef3ee7db89"),
"dateTime": ISODate("2021-10-02"),
"res": 0.5
},
{
"_id": ObjectId("5fb6727790f41fef3ee7db90"),
"dateTime": ISODate("2021-10-03"),
"res": 2.8
}
]
]
So this works just fine. However if the $match was "_id": ObjectId("5f034bfa4c0000abdfc7df2e") and there weren't any results returned (on the $lookup from uth) then I would like to return a set of results for a broader condition:
[
{
$match: { "_id": ObjectId("5f034bfa4c0000abdfc7df2e") }
},
{
$lookup: {
"from": "uth",
"let": {
"fromDate": "$dates_e.from",
"toDate": "$dates_e.to"
},
"pipeline": [
$match: {
$expr: {
$gte: ["$varData_e.dateTime", "$$fromDate"]
}
}
],
"as": "uth_e"
}
}
]
Any help appreciated!
I have two collections name listings and moods.
listings sample:
{
"_id": ObjectId("5349b4ddd2781d08c09890f3"),
"name": "Hotel Radisson Blu",
"moods": [
ObjectId("507f1f77bcf86cd799439010"),
ObjectId("507f1f77bcf86cd799439011")
]
}
moods sample:
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "Sports"
},
{
"_id": ObjectId("507f1f77bcf86cd799439010"),
"name": "Spanish Food"
},
{
"_id": ObjectId("507f1f77bcf86cd799439009"),
"name": "Action"
}
I need this record.
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "Sports",
"count": 1
},
{
"_id": ObjectId("507f1f77bcf86cd799439010"),
"name": "Spanish Food",
"count": 1
},
{
"_id": ObjectId("507f1f77bcf86cd799439009"),
"name": "Action",
"count": 0
}
I need this type of record. I have no idea about aggregate.
You can do it using aggregate(),
$lookup to join collection listings
$match pipeline to check moods _id in listings field moods array
db.moods.aggregate([
{
"$lookup": {
"from": "listings",
"as": "count",
let: { id: "$_id" },
pipeline: [
{
"$match": {
"$expr": { "$in": ["$$id", "$moods"] }
}
}
]
}
},
$addFields to add count on the base of $size of array count that we got from above lookup
{
$addFields: {
count: { $size: "$count" }
}
}
])
Playground
did this work:
db.collection.aggrate().count()
Try to combine the functions, it might work.
I am using below query to get combined data from users and project collections:
db.collection.aggregate([
{
"$group": {
"_id": "$userId",
"projectId": { "$push": "$projectId" }
}
},
{
"$lookup": {
"from": "users",
"let": { "userId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$_id", "$$userId" ] }}},
{ "$project": { "firstName": 1 }}
],
"as": "user"
}
},
{ "$unwind": "$user" },
{
"$lookup": {
"from": "projects",
"let": { "projectId": "$projectId" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$projectId" ] }}},
{ "$project": { "projectName": 1 }}
],
"as": "projects"
}
}
])
and it results like below:
[
{
"_id": "5c0a29e597e71a0d28b910aa",
"projectId": [
"5c0a2a8897e71a0d28b910ac",
"5c0a4083753a321c6c4ee024"
],
"user": {
"_id": "5c0a29e597e71a0d28b910aa",
"firstName": "Amit"
},
"projects": [
{
"_id": "5c0a2a8897e71a0d28b910ac",
"projectName": "LN-PM"
},
{
"_id": "5c0a4083753a321c6c4ee024",
"projectName": "fallbrook winery"
}
]
},
{
"_id": "5c0a29c697e71a0d28b910a9",
"projectId": [
"5c0a4083753a321c6c4ee024"
],
"user": {
"_id": "5c0a29c697e71a0d28b910a9",
"firstName": "Rajat"
},
"projects": [
{
"_id": "5c0a4083753a321c6c4ee024",
"projectName": "fallbrook winery"
}
]
}
]
Now i have another table "Worksheets" and want to include hours field in projects Array, which will be calculated from the worksheets table by specifying the projectId which is _id in the projects array. It will be find in worksheet table and hours will be incremented how many times this _id has in worksheets table. Below is my worksheet collection:
{
"_id" : ObjectId("5c0a4efa91b5021228681f7a"),
"projectId" : ObjectId("5c0a4083753a321c6c4ee024"),
"hours" : 8,
"userId" : ObjectId("5c0a29c697e71a0d28b910a9"),
"__v" : 0
}
{
"_id" : ObjectId("5c0a4f4191b5021228681f7c"),
"projectId" : ObjectId("5c0a2a8897e71a0d28b910ac"),
"hours" : 6,
"userId" : ObjectId("5c0a29e597e71a0d28b910aa"),
"__v" : 0
}
The result will look like below:
{
"_id": "5c0a29c697e71a0d28b910a9",
"projectId": [
"5c0a4083753a321c6c4ee024"
],
"user": {
"_id": "5c0a29c697e71a0d28b910a9",
"firstName": "Rajat"
},
"projects": [
{
"_id": "5c0a4083753a321c6c4ee024",
"projectName": "fallbrook winery",
"hours":8
}
]
}
You can use below aggregation
$lookup 3.6 nested syntax allows you to join nested collection inside the $lookup pipeline. You can perform all the aggregation inside the nested $lookup pipline
db.collection.aggregate([
{ "$group": {
"_id": "$userId",
"projectId": { "$push": "$projectId" }
}},
{ "$lookup": {
"from": "users",
"let": { "userId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$_id", "$$userId" ] }}},
{ "$project": { "firstName": 1 }}
],
"as": "user"
}},
{ "$unwind": "$user" },
{ "$lookup": {
"from": "projects",
"let": { "projectId": "$projectId" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$projectId" ] }}},
{ "$lookup": {
"from": "worksheets",
"let": { "projectId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$projectId", "$$projectId" ] }}},
{ "$group": {
"_id": "$projectId",
"totalHours": { "$sum": "$hours" }
}}
],
"as": "workHours"
}}
{ "$project": {
"projectName": 1,
"hours": { "$arrayElemAt": ["$workHours.totalHours", 0] }
}}
],
"as": "projects"
}}
])