performing $lookup on subset of collection - mongodb

I have this data
[
{"_id":0,"a":1,"b":1,"source":1},
{"_id":1,"a":1,"c":4,"source":1},
{"_id":2,"a":2,"d":6,"source":1},
{"_id":3,"a":2,"e":6,"source":1},
{"_id":4,"a":2,"f":6,"source":1},
{"_id":5,"a":3,"d":6,"source":1},
{"_id":6,"a":3,"b":1,"source":1},
{"_id":7,"a":3,"f":6,"source":1},
{"_id":8,"a":3,"qq":3,"source":2},
{"_id":9,"a":3,"fl":6,"source":2}
]
I want to return all documents whose a field is equal to the a field of a document that has a field b. Furthermore, all must be from source 1.
The final result should be this:
[
{"_id":0,a":1,"b":1,"source":1},
{"_id":1,"a":1,"c":4,"source":1},
{"_id":5,"a":3,"d":6,"source":1},
{"_id":6,"a":3,"b":1,"source":1},
{"_id":7,"a":3,"f":6,"source":1}
]
The following query gives me the results I want:
myCollection.aggregate([{"$match":{"b":{"$exists":true},"source":1}},
{"$group":{"_id":null, "a":{"$addToSet":"$a"}}},
{"$unwind":{"path":"$a"}},
{"$project":{"_id":false}},
{"$lookup":
{"from": "myCollection",
"localField":"a",
"foreignField":"a",
"as":"results"}},
{"$project":{"a":false}},
{"$unwind":{"path":"$results"}},
{"$replaceRoot":{"newRoot":"$results"}},
{"$match":{"source":1}}
])
However, having to add that last {"$match":{"source":1}} statement got me thinking that for large sets of data the $lookup statement is going to produce a lot of unwanted results that will then be filtered out by my last $match statement. Is there any way to prevent their generation by limiting $lookup to documents from myCollection where source equals 1?
ie replace
{"$lookup":
{"from": "myCollection"
with something like
{"$lookup":
{"from": myCollection.match({"source":1})
Alternatively, is there a more efficient pipeline I could be using?

You can filter few documents in the pipeline of $lookup stage. This will help in to gain some performance and avoid unnecessary results. You can use it like below:
{
"$lookup": {
"from": "collection",
"let": {
a_: "$a"
},
"pipeline": [
{
"$match": {
$expr: {
$and: [
{
$eq: [
"$source",
1
]
},
{
$eq: [
"$a",
"$$a_"
]
}
]
}
}
}
],
"as": "results"
}
}
Your $project stage,
{"$project":{"a":false}}
is useless actually, you can omit it.

Related

Using conditions for both collections (original and foreign) in lookup $match

I'm not sure if it is a real problem or just lack of documentations.
You can put conditions for documents in foreign collection in a lookup $match.
You can also put conditions for the documents of original collection in a lookup $match with $expr.
But when I want to use both of those features, it doesn't work. This is sample lookup in aggregation
{ $lookup:
{
from: 'books',
localField: 'itemId',
foreignField: '_id',
let: { "itemType": "$itemType" },
pipeline: [
{ $match: { $expr: { $eq: ["$$itemType", "book"] } }}
],
as: 'bookData'
}
}
$expr is putting condition for original documents. But what if I want to get only foreign documents with status: 'OK' ? Something like:
{ $match: { status: "OK", $expr: { $eq: ["$$itemType", "book"] } }}
Does not work.
I tried to play with the situation you provided.
Try to put $expr as the first key of $match object. And it should do the thing.
{ $lookup:
{
from: 'books',
localField: 'itemId',
foreignField: '_id',
let: { "itemType": "$itemType" },
pipeline: [
{ $match: { $expr: { $eq: ["$$itemType", "book"] }, status: 'OK' }}
],
as: 'bookData'
}
}
The currently accepted answer is "wrong" in the sense that it doesn't actually change anything. The ordering that the fields for the $match predicate are expressed in does not make a difference. I would demonstrate this with your specific situation, but there is an extra complication there which we will get to in a moment. In the meantime, consider the following document:
{
_id: 1,
status: "OK",
key: 123
}
This query:
db.collection.find({
status: "OK",
$expr: {
$eq: [
"$key",
123
]
}
})
And this query, which just has the order of the predicates reversed:
db.collection.find({
$expr: {
$eq: [
"$key",
123
]
},
status: "OK"
})
Will both find and return that document. A playground demonstration of the first can be found here and the second one is here.
Similarly, your original $match:
{ $match: { status: "OK", $expr: { $eq: ["$$itemType", "book"] } }}
Will behave the same as the one in the accepted answer:
{ $match: { $expr: { $eq: ["$$itemType", "book"] }, status: 'OK' }}
Said another way, there is no difference in behavior based on whether or not the $expr is used first. However, I suspect the overall aggregation is not expressing your desired logic. Let's explore that a little further. First, we need to address this:
$expr is putting condition for original documents.
This is not really true. According to the documentation for $expr, that operator "allows the use of aggregation expressions within the query language."
A primary use of this functionality, and indeed the first one listed in the documentation, is to compare two fields from a single document. In the context of $lookup, this ability to refer to fields from the original documents allows you to compare their values against the collection that you are joining with. The documentation has some examples of that, such as here and other places on that page which refer to $expr.
With that in mind, let's come back to your aggregation. If I am understanding correctly, your intent with the { $expr: { $eq: ["$$itemType", "book"] } predicate is to filter documents from the original collection. Is that right?
If so, then that is not what your aggregation is currently doing. You can see in this playground example that the $match nested inside of the $lookup pipeline does not affect the documents from the original collection. Instead, you should do that filtering via an initial $match on the base pipeline. So something like this:
db.orders.aggregate([
{
$match: {
$expr: {
$eq: [
"$itemType",
"book"
]
}
}
}
])
Or, more simply, this:
db.orders.aggregate([
{
$match: {
"itemType": "book"
}
}
])
Based on all of this, your final pipeline should probably look similar to the following:
db.orders.aggregate([
{
$match: {
"itemType": "book"
}
},
{
$lookup: {
from: "books",
localField: "itemId",
foreignField: "_id",
let: {
"itemType": "$itemType"
},
pipeline: [
{
$match: {
status: "OK"
}
}
],
as: "bookData"
}
}
])
Playground example here. This pipeline:
Filters the data in the original collection (orders) by their itemType. From the sample data, it removes the document with _id: 3 as it has a different itemType than the one we are looking for ("book").
It uses the localField/foreignField syntax to find data in books where the _id of the books document matches the itemId of the source document(s) in the orders collection.
It further uses the let/pipeline syntax to express the additional condition that the status of the books document is "OK". This is why books document with the status of "BAD" does not get pulled into the bookData for the orders document with _id: 2.
Documentation for the (combined) second and third parts is here.

Mongodb - aggregate match subarray

I trying to match the data in Subarray for some reason it is grouped like this.
Data :
{
"_id": 1,
"addresDetails": [
[
{
"Name":"John",
"Place":"Berlin",
"Pincode":"10001"
},
{
"Name":"Sarah",
"Place":"Newyork",
"Pincode":"10002"
}
],
[
{
"Name":"Mark",
"Place":"Tokyo",
"Pincode":"10003"
},
{
"Name":"Michael",
"Place":"Newyork",
"Pincode":"10002"
}
]
]
}
I tried with this Match query:
{
"$match":{
"attributes":{
"$elemMatch":{
"$in":["Mark"]
}
}
}
}
I am getting No data found , How do i match the elements in this subarrays.
Query
aggregation way, in general if you are stuck and query operators or update operators seems not enough, aggregation provides so much more operators, and its alternative.
2 nested filter in the 2 level arrays to find a Name in array [Mark]
*maybe there is a shorter more declarative way with $elemMatch, and possible a way to use index, also think about schema change, maybe you dont really need array with array members (the bellow doesnt use index)
*i used addressDetails remove the one s else you will get empty results
Playmongo
aggregate(
[{"$match":
{"$expr":
{"$ne":
[{"$filter":
{"input": "$addressDetails",
"as": "a",
"cond":
{"$ne":
[{"$filter":
{"input": "$$a",
"as": "d",
"cond": {"$in": ["$$d.Name", ["Mark"]]}}},
[]]}}},
[]]}}}])
You can apparently nest elemMatch as well, e.g.:
db.collection.find({
"addresDetails": {
$elemMatch: {
$elemMatch: {
"Name": "Mark"
}
}
}
})
This matches your document, as shown by this mongo playground link, but is probably not very efficient.
Alternatively you can use aggregations. For example unwind may help to flatten out your nested arrays, and allow for easier match afterwards.
db.collection.aggregate([
{
"$unwind": "$addresDetails"
},
{
"$match": {
"addresDetails.Name": "Mark"
}
}
])
You can find the mongo playground link for this here. But unwind is usually not preferred as the first stage of the aggregation pipeline either, again because of performance reasons.
Also please note that the results for these 2 options are different!

Mongoose aggregate query to filter value from the ref collection

I have 2 collections as follow:
event
{
"_id" : ObjectId("61f272dd1fac703fec69105a"),
"eventActivity" : [
ObjectId("61f76703196ea94bd43fa92e"),
]
}
event-activity
{
"_id" : ObjectId("61f76703196ea94bd43fa92e"),
"activity" : ObjectId("61f2a69bfe99e07db083de50"),
}
Based on the collections above, event has eventActivity field which refers to event-activity collection. I'm trying to filter the event by the value of event-activity.activity.
So if for example my filtration selection has activity in an array ['61d6b2060d6fe32d9853ad40', '61f2a69bfe99e07db083de50'], it will return the event. If the filtration selection has activity id ['61d6b2060d6fe32d9853ad40'], it should not return any event as there is no event with that activity id from event-activity
I can't really understand how the aggregate lookup work but I tried this and it doesn't work.
event.aggregate([
{"$lookup":{
"from":"event-activity",
"localField":"activity",
"foreignField":"_id",
"as":"event-activity"
}},
{
"$match":{
"event-activity.activity":{
"$in":["61d6b2060d6fe32d9853ad40","61f2a69bfe99e07db083de50"]
}
}
}
])
I referred to the manual here
Or can it be done by find() instead?
Query
you can use lookup with pipeline and put the match inside
if the lookup result is empty you can remove or keep the document based on your needs, with something like this
{"$match":{"$expr":{"$ne":["$activities", []]}}}
Test code here
event.aggregate(
[{"$lookup":
{"from":"event-activity",
"localField":"eventActivity",
"foreignField":"_id",
"pipeline":
[{"$match":
{"activity":
{"$in":
[ObjectId("61d6b2060d6fe32d9853ad40"),
ObjectId("61f2a69bfe99e07db083de50")]}}}],
"as":"activities"}}])
If I've understood correctly you can use this aggregation query:
This query uses a $lookup with a pipeline where the result is given by a match with an $in. So, the join will return the values where the event-activity.activity is in the array event.eventActivity.
db.event.aggregate([
{
"$lookup": {
"from": "event-activity",
"as": "activities",
"let": {
"ea": "$eventActivity"
},
"pipeline": [
{
"$match": {
"$expr": {
"$in": [
"$activity",
"$$ea"
]
}
}
}
]
}
}
])
Example here where I've used integers as activity to see easier the join.

Using object path in $lookup mongo aggregation pipeline

For today's task I am trying aggregating documents in a collection (let's call it collection1 and in one of the pipeline's stages I am trying to use $lookup to retrieve documents from another collection (let's call it collection2).
collection1 object model:
{
"field1": "value1",
"field2": "value2"
"field3": "value3"
}
collection2 object model:
{
"field1: "value1",
"field2"; "value2",
"field3: {
"field31": "value31",
"field32": "value32"
}
}
What I am exactly trying to do is to retrieve the documents from collection2 where field3.field31 equals value of the collection1s field1.
My $lookup stage looks like approx like this but currently it doesn't seem to work. I did not find any clue if this should work but looking forward to your replies.
{
$lookup: {
from: "collection2",
let: {
"c": "$field1",
"l": "$field2",
"t": "$field3",
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$field1", "$$c"] },
{ $eq: ["$field2", "$$l"] },
{ $eq: ["$field3.field31", "$$t"] },
]
}
},
},
],
as: "awesomejoin"
}
}
I want to avoid having a project or a group and then unwinding and filtering again. My wish is to get the records directly from the match stage thinking this is better in terms of performance...
Let me know your thoughts on this.
Thank you
Please try this :
db.Collection1.aggregate([
{
$lookup:
{
from: "Collection2",
localField: "field1",
foreignField: "field3.field31",
as: "docs"
}
}
])
It should be simple with plain $lookup, not exactly sure why you're creating local variables and looking for equals on same variables, Also $unwind will be used on arrays, over objects you could access inner elements using . notation same as like in programming languages.
Ref : $lookup

MongoDB Aggregation Lookup with Pipeline Doesn't Work

I have two collections. I am trying to add the documents of Collection 2 to Collection 1, if number 1 and number 2 in Collection 2 is within a certain range as specified in Collection 1. FYI ObjectId in Collection 1 and ObjectId in Collection 2 refer to two different items/products, hence I cannot join the two collections on this id.
Example Document from Collection 1:
{'_id': ObjectId('4321'),
'number1_lb': 61.205672407820025,
'number1_ub': 61.24170844385606,
'number2_lb': -149.75074963516136,
'number2_ub': -149.71471359912533}
Example Document from Collection 2:
{'_id': ObjectId('1234'),
'number1': 1.282298,
'number2': 103.8475}
I want the output:
{'_id': ObjectId('4321'),
'number1_lb': 61.205672407820025,
'number1_ub': 61.24170844385606,
'number2_lb': -149.75074963516136,
'number2_ub': -149.71471359912533,
'recs': [ObjectId('3456'), ObjectId('4567'),...]
I thought that a lookup stage with pipeline would work. My code is currently as follows:
{"$lookup":{
"from": "Collection 2",
"let":{
"number1_lb":"$number1_lb",
"number1_ub":"$number1_ub",
"number2_lb":"$number2_lb",
"number2_ub":"$number2_ub"
},
"pipeline": [
{"$match":
{"$expr":
{"$and":[
{"$gte":["$number1","$$number1_lb"]},
{"$gte":["$number2","$$number2_lb"]},
{"$lte":["$number1","$$number1_ub"]},
{"$lte":["$number2","$$number2_ub"]}
]}}}
],
"as": "recs"
}}
But running the above gives me no output. Am I doing something wrong??
I ran it and it seems to work fine; but I had to tweak your input data in coll1 as it didn't meet the $match the criteria.
from pymongo import MongoClient
from bson.json_util import dumps
db = MongoClient()["testdatabase"]
# Data Setup
db.coll1.replace_one({"_id": "4321"}, {"_id": "4321", "number1_lb": -61.205672407820025, "number1_ub": 61.24170844385606, "number2_lb": -149.75074963516136, "number2_ub": 149.71471359912533}, upsert=True)
db.coll2.replace_one({"_id": "1234"}, {"_id": "1234", "number1": 1.282298, "number2": 103.8475}, upsert=True)
# Run the aggregation
results = db.coll1.aggregate([
{"$lookup": {
"from": "coll2",
"let": {
"number1_lb": "$number1_lb",
"number1_ub": "$number1_ub",
"number2_lb": "$number2_lb",
"number2_ub": "$number2_ub"
},
"pipeline": [
{"$match":
{"$expr":
{"$and": [
{"$gte": ["$number1", "$$number1_lb"]},
{"$gte": ["$number2", "$$number2_lb"]},
{"$lte": ["$number1", "$$number1_ub"]},
{"$lte": ["$number2", "$$number2_ub"]}
]}}}
],
"as": "recs"
}}
])
# pretty up the results
print(dumps(results, indent=4))
gives:
[
{
"_id": "4321",
"number1_lb": -61.205672407820025,
"number1_ub": 61.24170844385606,
"number2_lb": -149.75074963516136,
"number2_ub": 149.71471359912533,
"recs": [
{
"_id": "1234",
"number1": 1.282298,
"number2": 103.8475
}
]
}
]
You are looking to use a $lookup and a $project :
{
$lookup: {
from: "Collection2",
localField: [Foreign Field of the Collection1],
foreignField: [Principal field of the foreign collection here Collection2],
as: "nameJoint"
}
},
{$project: {
"newFieldName":
}},
But to make a joint between 2 document there as to be an commun field between those 2 documents. I am not sure there is one in this situation or I misunderstand it.
(A $lookup is bassicaly a SQL joint in noSQL )