I have a collection which contains documents with multiple arrays. These are generally quite large, but for purposes of explaining you can consider the following two documents:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" },
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
]
},
{
"obj1": [
{ "a": "c", "b": "b" }
],
"obj2": [
{ "a": "c", "b": "c" }
]
}
The idea is to just get the matching elements in the array to the query. There are multiple matches required and within multiple arrays so this is not within the scope of what can be done with projection and the positional $ operator. The desired result would be like:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "b" },
]
},
A traditional approach would be something like this:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$unwind": "$obj1" },
{ "$match": {
"obj1.a": "a",
"obj1.b": "b"
}},
{ "$unwind": "$obj2" },
{ "$match": { "obj2.b": "b" }},
{ "$group": {
"_id": "$_id",
"obj1": { "$addToSet": "$obj1" },
"obj2": { "$addToSet": "$obj2" }
}}
])
But the use of $unwind there for both arrays causes the overall set to use a lot of memory and slows things down. There are also possible problems there with $addToSet and splitting the $group stages for each array can make things even slower.
So I am looking for a process that is not so intensive but arrives at the same result.
Since MongoDB 3.0 we have the $filter operator, which makes this really quite simple:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": {
"$filter": {
"input": "$obj1",
"as": "el",
"cond": {
"$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]
}
}
},
"obj2": {
"$filter": {
"input": "$obj2",
"as": "el",
"cond": { "$eq": [ "$$el.b", "b" ] }
}
}
}}
])
MongoDB 2.6 introduces the $map operator which can act on arrays in place without the need to $unwind. Combined with some other logical operators and additional set operators that have been added to the aggregation framework there is a solution to this problem and others.
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": {
"$setDifference": [
{ "$map": {
"input": "$obj1",
"as": "el",
"in": {
"$cond": [
{ "$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]},
"$$el",
false
]
}
}},
[false]
]
},
"obj2": {
"$setDifference": [
{ "$map": {
"input": "$obj2",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.b", "b" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
The core of this is in the $map operator which works like an and internalized $unwind by allowing processing of all the array elements, but also allows operations to act on those array elements in the same statement. Typically this would be done in several pipeline stages but here we can process within a single $project, $group or $redact stage.
In this case that inner processing utilizes the $cond operator which combines with a logical condition in order to return a different result for true or false. Here we act on usage of the $eq operator to test values of the fields contained within the current element in much the same way as a separate $match pipeline stage would be used. The $and condition is another logical operator which works on combining the results of multiple conditions on the element, much in the same way as the $elemMatch operator would work within a $match pipeline stage.
Finally, since our $cond operator was used to either return the value of the current element or false if the condition was not true we need to "filter" any false values from the array produced my the $map operation. The is where the $setDifference operator is used to compare the two input arrays and return the difference. So when compared to an array that only contains false for it's element, the result will be the elements that were returned from the $map without the false elements coming out of $cond when the conditions were not met.
The result filters only the matching elements from the array without having to run through seperate pipeline stages for $unwind, $match and $group.
return more then one match,
const { timeSlots } = req.body;
let ts = [];
for (const slot of timeSlots) {
ts.push({
$eq: ['$$timeSlots.id',slot.id],
});
}
const products = await Product.aggregate<ProductDoc>([
{
$match: {
_id: req.params.productId,
recordStatus: RecordStatus.Active,
},
},
{
$project: {
timeSlots: {
$filter: {
input: '$timeSlots',
as: 'timeSlots',
cond: {
$or: ts,
},
},
},
name: 1,
mrp: 1,
},
},
]);
Related
I have this Collection in mongodb
{
"_id" : "777",
"someKey" : "someValue",
"someArray" : [
{
"name" : "name1",
"someNestedArray" : [
{
"name" : "value"
},
{
"name" : "delete me"
}
]
}
]
}
I want to find document based on someArray.someNestedArray.name
but i can't find any useful link all search result about update nested array
i am trying this but return nothing
db.mycollection.find({"someArray.$.someNestedArray":{"$elemMatch":{"name":"1"}}})
db.mycollection.find({"someArray.$.someNestedArray.$.name":"1"})
and Some thing else
how can i find by element in double nested array mongodb?
In the simplest sense this just follows the basic form of "dot notation" as used by MongoDB. That will work regardless of which array member the inner array member is in, as long as it matches a value:
db.mycollection.find({
"someArray.someNestedArray.name": "value"
})
That is fine for a "single field" value, for matching multiple-fields you would use $elemMatch:
db.mycollection.find({
"someArray": {
"$elemMatch": {
"name": "name1",
"someNestedArray": {
"$elemMatch": {
"name": "value",
"otherField": 1
}
}
}
}
})
That matches the document which would contain something with a a field at that "path" matching the value. If you intended to "match and filter" the result so only the matched element was returned, this is not possible with the positional operator projection, as quoted:
Nested Arrays
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
Modern MongoDB
We can do this by applying $filter and $map here. The $map is really needed because the "inner" array can change as a result of the "filtering", and the "outer" array of course does not match the conditions when the "inner" was stripped of all elements.
Again following the example of actually having multiple properties to match within each array:
db.mycollection.aggregate([
{ "$match": {
"someArray": {
"$elemMatch": {
"name": "name1",
"someNestedArray": {
"$elemMatch": {
"name": "value",
"otherField": 1
}
}
}
}
}},
{ "$addFields": {
"someArray": {
"$filter": {
"input": {
"$map": {
"input": "$someArray",
"as": "sa",
"in": {
"name": "$$sa.name",
"someNestedArray": {
"$filter": {
"input": "$$sa.someNestedArray",
"as": "sn",
"cond": {
"$and": [
{ "$eq": [ "$$sn.name", "value" ] },
{ "$eq": [ "$$sn.otherField", 1 ] }
]
}
}
}
}
},
},
"as": "sa",
"cond": {
"$and": [
{ "$eq": [ "$$sa.name", "name1" ] },
{ "$gt": [ { "$size": "$$sa.someNestedArray" }, 0 ] }
]
}
}
}
}}
])
Therefore on the "outer" array the $filter actually looks at the $size of the "inner" array after it was "filtered" itself, so you can reject those results when the whole inner array does in fact match noting.
Older MongoDB
In order to "project" only the matched element, you need the .aggregate() method:
db.mycollection.aggregate([
// Match possible documents
{ "$match": {
"someArray.someNestedArray.name": "value"
}},
// Unwind each array
{ "$unwind": "$someArray" },
{ "$unwind": "$someArray.someNestedArray" },
// Filter just the matching elements
{ "$match": {
"someArray.someNestedArray.name": "value"
}},
// Group to inner array
{ "$group": {
"_id": {
"_id": "$_id",
"name": "$someArray.name"
},
"someKey": { "$first": "$someKey" },
"someNestedArray": { "$push": "$someArray.someNestedArray" }
}},
// Group to outer array
{ "$group": {
"_id": "$_id._id",
"someKey": { "$first": "$someKey" },
"someArray": { "$push": {
"name": "$_id.name",
"someNestedArray": "$someNestedArray"
}}
}}
])
That allows you to "filter" the matches in nested arrays for one or more results within the document.
You can also try something like below:
db.collection.aggregate(
{ $unwind: '$someArray' },
{
$project: {
'filteredValue': {
$filter: {
input: "$someArray.someNestedArray",
as: "someObj",
cond: { $eq: [ '$$someObj.name', 'delete me' ] }
}
}
}
}
)
Consider the following document:
{
"item1" : [
{
"a" : 1,
"b" : 2
}
],
"item2" : [ "a", "b" ]
}
The following query:
db.test.aggregate([
{ "$project": { "items": { "$setIntersection": [ "$item1", "$item2" ] } }}
])
returns the expected result:
{ "_id" : ObjectId("5710785387756a4a75cbe0d1"), "a" : [ ] }
If the document looks like this:
{ "item2" : [ "a", "b" ] }
Then:
db.test.aggregate([ { "$project": {
"a": { "$setIntersection": [ "$item2", [ "a" ] ] } } }
])
Yields:
{ "_id" : ObjectId("5710785387756a4a75cbe0d1"), "a" : [ "a" ] }
But
db.test.aggregate([
{ "$project": { "items": { "$setIntersection": [ "$item2", [ { "a" : 1, "b" : 2 } ] ] } } }
])
failed with :
"errmsg" : "field inclusion is not allowed inside of $expressions"
And:
db.test.aggregate([ { "$project": {
"items": { "$setIntersection": [ "$item2", [ { "a": "b" } ] ] } } }
])
failed with:
"errmsg" : "FieldPath 'b' doesn't start with $"
The only way to make this work is to use the $literal operator.
Why should we use the $literal operator if $setIntersection arguments are array of sub-documents and not a field in the document?
This would appear to be an artifact of MongoDB 3.2 which incorporated a change that allows arrays to be notated while directly interpolating properties of the document.
For example, with a document like:
{ "a": 1, "b": 2, "c": 3, "d": 4 }
Then you are "now" allowed to notate those elements inside an array, like:
db.collection.aggregate([
{ "$project": {
"array": [
{ "field1": "$a", "field2": "$b" },
{ "field1": "$c", "field2": "$d" }
]
}}
])
In previous versions ( in this case MongoDB 2.6 ) you would instead need to use this $map expression:
db.collection.aggregate([
{ "$project": {
"array": {
"$map": {
"input": ["A","B"],
"as": "el",
"in": {
"$cond": {
"if": { "$eq": [ "$$el", "A" ] },
"then": { "field1": "$a", "field2": "$b" },
"else": { "field1": "$c", "field2": "$c" }
}
}
}
}
}}
])
Or in prior versions to that then something a bit more long winded using $unwind and $group, but the same basic principle of transposing a "source" array with other data. The main point though is the change in notation allowed in MongoDB 3.2, which would otherwise "error" in a prior version.
Therefore in a prior version, say MongoDB 2.6.x where $setIntersection is supported then the following works just fine, since all values are considered "literal" unless actually referencing an array present in the document:
db.collection.aggregate([
{ "$project": {
"a": {
"$setIntersection": [
[{"a": 1}],
[{"a": 1}]
]
}
}}
])
Provided of course that "collection" as a collection actually has something in it.
But since MongoDB 3.2 allows a different syntax for "interpolated arrays", it now expects the "right side" to evaluate to a property from the document or other valid expression. So now the $literal syntax is required:
db.collection.aggregate([
{ "$project": {
"a": {
"$setIntersection": [
{ "$literal": [{"a": 1}] },
{ "$literal": [{"a": 1}] }
]
}
}}
])
This generally comes down to the saying that "you can't have your cake and eat it too". The "new" syntax allows you to express array content with "interpolations" in a nice way without resorting to other expressions to "coerce" the content into an array form.
The consequence of this is that every such expression is now expecting "values" to resolve to a property or expression rather than being directly considered a "literal", and where you mean that to be so, you are now required to express that using the $literal operator.
So it is in fact a "breaking" change in allowed syntax between versions. But one that most people should easily live with.
This appears to be a compatibility changes in MongoDB 3.2 thus is the expected behavior as mentioned in the Aggregation Compatibility Changes in MongoDB 3.2:
Array elements are no longer treated as literals in the aggregation pipeline. Instead, each element of an array is now parsed as an expression. To treat the element as a literal instead of an expression, use the $literal operator to create a literal value.
When querying mongodb, is it possible to process ("project") the result so as to perform array concatenation?
I actually have 2 different scenarios:
(1) Arrays from different fields:, e.g:
Given:
{companyName:'microsoft', managers:['ariel', 'bella'], employees:['charlie', 'don']}
{companyName:'oracle', managers:['elena', 'frank'], employees:['george', 'hugh']}
I'd like my query to return each company with its 'managers' and 'employees' concatenated:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
(2) Nested arrays:, e.g.:
Given the following docs, where employees are separated into nested arrays (never mind why, it's a long story):
{companyName:'microsoft', personnel:[ ['ariel', 'bella'], ['charlie', 'don']}
{companyName:'oracle', personnel:[ ['elena', 'frank'], ['george', 'hugh']}
I'd like my query to return each company with a flattened 'personal' array:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
I'd appreciate any ideas, using either 'find' or 'aggregate'
Thanks a lot :)
Of Course in Modern MongoDB releases we can simply use $concatArrays here:
db.collection.aggregate([
{ "$project": {
"companyNanme": 1,
"allPersonnel": { "$concatArrays": [ "$managers", "$employees" ] }
}}
])
Or for the second form with nested arrays, using $reduce in combination:
db.collection.aggregate([
{ "$project": {
"companyName": 1,
"allEmployees": {
"$reduce": {
"input": "$personnel",
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
}
}}
])
There is the $setUnion operator available to the aggregation framework. The constraint here is that these are "sets" and all the members are actually "unique" as a "set" requires:
db.collection.aggregate([
{ "$project": {
"companyname": 1,
"allPersonnel": { "$setUnion": [ "$managers", "$employees" ] }
}}
])
So that is cool, as long as all are "unique" and you are in singular arrays.
In the alternate case you can always process with $unwind and $group. The personnel nested array is a simple double unwind
db.collection.aggregate([
{ "$unwind": "$personnel" },
{ "$unwind": "$personnel" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": { "$push": { "$personnel" } }
}}
])
Or the same thing as the first one for versions earlier than MongoDB 2.6 where the "set operators" did not exist:
db.collection.aggregate([
{ "$project": {
"type": { "$const": [ "M", "E" ] },
"companyName": 1,
"managers": 1,
"employees": 1
}},
{ "$unwind": "$type" },
{ "$unwind": "$managers" },
{ "$unwind": "$employees" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "M" ] },
"$managers",
"$employees"
]
}
}
}}
])
Using the example zipcodes collection, I have a query like this:
db.zipcodes.aggregate([
{ "$match": {"state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id": { "city": "$city" }, "ZipsPerCity": {"$addToSet": "$_id"}}},
{ "$match": { "ZipsPerCity" : { "$size": 2 }}},
]).pretty()
This is just an example that looks for cities (in the state of NY and PA) that have 2 zipcodes:
{
"_id" : {
"city" : "BETHLEHEM"
},
"ZipsPerCity" : [
"18018",
"18015"
]
}
{
"_id" : {
"city" : "BEAVER SPRINGS"
},
"ZipsPerCity" : [
"17843",
"17812"
]
}
Now suppose that I want to compare "BEAVER SPRINGS" zip codes to "BETHLEHEM" zip codes, using the "$setDifference" set operator? I tried using the "$setDifference" operator in a $project operator, like this:
db.zipcodes.aggregate([
{ "$match": { "state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id: {city : "$city"},"ZipsPerCity": {$addToSet: "$_id"}}},
{ "$match": { "ZipsPerCity" : { $size: 2 }}},
{ "$project": {
"int": { "$setDifference":[
"$_id.city.BETHLEHEM.ZipsPerCity",
"$_id.city.BEAVER SPRINGS.ZipsPerCity"
]}
}}
]).pretty()
That doesn't even look right, let alone produce results. No errors though.
How would you refer to a couple of arrays built using $addToSet like this, using $setDifference (or any of the set operators)?
The first thing about what you are trying to do here is that the arrays you want to compare are actually in two different documents. All of the aggregation framework operators in fact work on only one document at a time, with the exception of $group which is meant to "aggregate" documents and possibly $unwind which essentially turns one document into many.
In order to compare you would need the data to occur in one document, or at least be "paired" in some way. So there is a technique to do that:
db.zipcodes.aggregate([
{ "$match": {"state": { "$in": [ "PA","NY" ] } }},
{ "$group": {
"_id": "$city",
"ZipsPerCity": { "$addToSet": "$_id"}
}},
{ "$match": { "ZipsPerCity" : { "$size": 2 } }},
{ "$group": {
"_id": null,
"A": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BETHLEHEM" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}},
"B": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BEAVER SPRINGS" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}}
}},
{ "$project": {
"A": 1,
"B": 1,
"C": { "$setDifference": [ "$A.ZipsPerCity", "$B.ZipsPerCity" ] }
}}
])
That is a little contrived and I am well aware that the actual result set has more than two cities, but the point it to illustrate that the arrays/sets sent to the "set operators" such as $setDifference need to be in the same document.
The result here compares the "left" array with the "right" array, returning the members from the "left" that are different to the "right". Both sets are unique here with no overlap so the results should be expected:
{
"_id" : null,
"A" : {
"city" : "BETHLEHEM",
"ZipsPerCity" : [
"18018",
"18015"
]
},
"B" : {
"city" : "BEAVER SPRINGS",
"ZipsPerCity" : [
"17843",
"17812"
]
},
"C" : [
"18018",
"18015"
]
}
This is really better illustrated with actual "sets" with common members. So this document:
{ "A" : [ "A", "A", "B", "C", "D" ], "B" : [ "B", "C" ] }
Responds to $setDifference:
{ "C" : [ "A", "D" ] }
And $setEquals:
{ "C" : false }
$setIntersection:
{ "C" : [ "B", "C" ] }
$setUnion:
{ "C" : [ "B", "D", "C", "A" ] }
$setIsSubSet reversing the order to $B, $A:
{ "C" : true }
The other set operators $anyElementTrue and $allElementsTrue are likely most useful when used along with the $map operator which can re-shape arrays and evaluate conditions against each element.
A very good usage of $map is alongside $setDifference, where you can "filter" array contents without using $unwind:
db.arrays.aggregate([
{ "$project": {
"A": {
"$setDifference": [
{
"$map": {
"input": "$A",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
That can be very handy when you have a lot of results in the pipeline and you do not want to "expand" out all of those results by "unwinding" the array. But note that this is a "set" and as such only one element matching "A" is returned:
{ "A" : ["A"] }
So the things to keep in mind here are that you:
Operate only within the "same" document at a time
The results are generally "sets" and that means they are both "unique" and "un-ordered" as a result.
Overall that should be a decent run-down on what the set operators are and how you use them.
I have a database of 30mb size, and it has 300 documents which are stored in a single collection, and their size vary from 1mb to 10kb. I am using the new aggregation framework which comes with 2.6 and I do not have any indexes.
I have an aggregation pipeline as following:
1. $match > first query match
2. $project > exclude some fields for efficiency
3. $unwind > unwind one of the arrays
4. $unwind > unwind second array
5. $project > projection to find matching fields among two arrays with $eq
6. $match > same:true
7. $group > put the unwinded arrays together
8. $limit(50)
this pipeline above requires 30 seconds. If I remove $limit, it takes ages. My question is:
Database size is only 30MB, and pipeline is not complicated at all. Why is it taking so long? Any ideas on that?
EDIT
My schema is as following:
{
username: string (max 20 chars
userid : string max 20 chars
userage : string max 20 chars
userobj1: array of objects, length: ~300-500
// example of userobj1:
[
{
innerobj1: array of objects, length: ~30-50
innerobj2: array of objects, length: ~100-200
userinfo1: string max 20 chars
userinfo2: string max 20 chars
userinfo3: string max 20 chars
userinfo4: string max 20 chars
} ...
]
userobj2: same as userobj1
userobj3: same as userobj1
userobj4: same as userobj1
}
this document above has inner objects up to 3-4 levels. Sorry that I cannot provide an example but the alias should be enough. Example query is as following:
1. $match:
$and : [
{userobj1: $elemMatch: {userinfo1:a}},
{userobj1: $elemMatch: {userinfo4:b}}
]
2. $project {username:1, userid:1, userobj1:1, userobj2:1}
3. $unwind userobj1
4. $unwind userobj2
5. $project
{
username:1,
userid:1,
userobj1:1,
userobj2:1,
userobj3:1,
userobj4:1,
"same" : {
$eq: [ userobj3.userinfo4, userobj4.userinfo4 ]
}
}
6. $match {same:true}
7. $group all arrays back
8. limit 50.
There is something here that I just don't get about what you are actually trying to do here. So please bear with me on the possible actual questions and answers that I see.
Considering this simplified data set to your case:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
],
"obj2": [
{ "a": "c", "b": "b" },
{ "a": "c", "b": "c" }
]
},
{
"obj1": [
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "c" }
]
}
Q: "Are you not just trying to to match the documents with { "a": "a", "b": b" } in "obj1" and also { "b": "b" } in "object2"?"
If that is the case then this is just a simple query with .find():
db.collection.find({
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
})
Matches only one of those documents that meets the conditions, in this case just the one:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
],
"obj2": [
{ "a": "c", "b": "b" },
{ "a": "c", "b": "c" }
]
}
Q: "Are you possibly trying to find the positions in the array where your conditions are true?"
If so there are some operators available to MongoDB 2.6 that helps you without using $unwind:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": 1,
"obj2": 1,
"match1": {
"$map": {
"input": "$obj1",
"as": "el",
"in": {
"$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]
}
}
},
"match2": {
"$map": {
"input": "$obj2",
"as": "el",
"in": {
"$eq": [ "$$el.b", "b" ]
}
}
}
}}
])
Gives you:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
],
"obj2": [
{ "a": "c", "b": "b" },
{ "a": "c", "b": "c" }
],
"match1" : [
true,
false
],
"match2" : [
true,
false
]
}
Q: "Or are you possibly trying to "filter" only the matching array elements to those conditions?"
You can do this with more set operators in MongoDB 2.6 without using $unwind:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": {
"$setDifference": [
{ "$map": {
"input": "$obj1",
"as": "el",
"in": {
"$cond": [
{ "$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]},
"$$el",
false
]
}
}},
[false]
]
},
"obj2": {
"$setDifference": [
{ "$map": {
"input": "$obj2",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.b", "b" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
And the result:
{
"obj1": [
{ "a": "a", "b": "b" },
],
"obj2": [
{ "a": "c", "b": "b" },
]
}
The last entry there is the cutest which combines $cond, $map and $setDifference to do some complex filtering of the objects in the array in order to filter just the matches to the conditions. You previously would have to $unwind and $match to get those results.
So it is both $unwind and $group that are not required to actually get to any of these results, and those are really killing you. Also your big "pass through" on the "unwound" arrays with $eq suggests trying to get to the end result of one of the above, but in the way you have implemented it would be very costly.
Also try to have an index within one of those arrays for the element to match that is going to reduce your working results down as far as possible. In all cases it's going to improve things even if you cannot have a compound "multi-key" index due to the restrictions there.
Anyhow, hoping that at least something here that either matches your intent or is at least close to what you are trying to do.
Since your comments went this way, matching values of "obj1.a" to "obj2.b" without the filtering is not much different to the general cases shown.
db.objects.aggregate([
{ "$project": {
"match": {
"$size": {
"$setIntersection": [
{ "$map": {
"input": "$obj1",
"as": "el",
"in": { "$concat": ["$$el.a",""] }
}},
{ "$map": {
"input": "$obj2",
"as": "el",
"in": { "$concat": ["$$el.b",""] }
}}
]
}
}
}},
{ "$match": { "$gte": 1 } }
])
All simply done without using $unwind.
I know this is an old question, but it looks like a simple answer was never reached and it involves using an expression that was available in 2.6 so it would have worked back then too. You don't need to do any $unwinding or complex $mapping you just need to do a $setIntersection on the two arrays that you want to find a match in.
Using the example data from the very long answer:
db.foo.aggregate(
{$match:{"obj1.a":"a"}},
{$project:{keep:{$setIntersection:["$obj1.b","$obj2.b"]},obj1:1,obj2:1}},
{$match:{keep:{$ne:[]}}})
{ "_id" : ObjectId("588a8206c01d80beca3a8e45"), "obj1" : [ { "a" : "a", "b" : "b" }, { "a" : "a", "b" : "c" } ], "obj2" : [ { "a" : "c", "b" : "b" }, { "a" : "c", "b" : "c" } ], "keep" : [ "b", "c" ] }
Only one of the two documents is kept, the one that had two values of "b" in both obj1 and obj2 arrays.
In your original "syntax" the $project stage would be
same: {$setIntersection: [ '$userobj3.userinfo4', '$userobj4.userinfo4' ]}
My guess is that it takes that long because there are no indexes so it does a full collection scan every time it needs a record.
Try adding an index on userinfo1:a and I think you will see a good performance gain. I will also recommend that you remove the AND syntax from the match phase and rewrite it as a list.
I think it would be really helpful for both you and the question to give us the output of the aggregation's explain. In mongo 2.6 you can have explain in aggregation pipeline.
db.collection.aggregate( [ ... stages ...], { explain:true } )