Mongo Aggregation How to match an array inside lookup without using $expr

Mongo Aggregation How to match an array inside lookup without using $expr - mongodb

I have an aggregation pipeline query (I've removed unnecesary stuff) that work when using $expr but doesn't work without. However, I want to avoid using the $expr for better performance, so the indices will be used. Logically there is a many to many relation here between rule and resource. I want to summarize the cost of the resources per rule. The problem here is to match the resources inside the grouped array without using an expression.
with $expr:
db.collection.aggregate([
{'$group': {'_id': {'rule_id': '$rule_id'}, 'rule_id': {'$first': '$rule_id'}, 'resources_ids': {'$push': '$resource_id'}}},
{'$lookup':
{'from': 'other_collection',
'let': {'resources_ids': '$resources_ids'},
'pipeline': [
{'$match':
{'$expr":
{'$and': [
{'$in':['$resource_id', '$$resources_ids']}
]}
}
},
{'$group': {'_id': {}, 'total_cost': {'$sum': '$cost'}}}], 'as': 'results'}}])
without $expr:
db.collection.aggregate([
{'$group': {'_id': {'rule_id': '$rule_id'}, 'rule_id': {'$first': '$rule_id'}, 'resources_ids': {'$push': '$resource_id'}}},
{'$lookup':
{'from': 'cost_data',
'let': {'resources_ids': '$resources_ids'},
'pipeline': [
{'$match':
{'$and': [
{'resource_id': {'$in': '$$resources_ids'}},
]}
},
{'$group': {'_id': {}, 'total_cost': {'$sum': '$cost'}}}], 'as': 'results'}}])

I think #rickhg12hs' comment gives the right answer. There shouldn't be any need for $expr here. The localField/foreignField syntax will work correctly when the localField is an array without needing to $unwind (or use $expr) as documented here. Therefore the matching component of your $lookup can effectively look like this:
$lookup: {
from: "foreign",
localField: "resources_ids",
foreignField: "resource_id",
as: "result"
}
You can compare the outputs of this syntax above with the more verbose pipeline/$expr version to see that they are the same.
A few other thoughts come to mind. The first is that you can combine the localField/foreignField syntax with the pipeline syntax so that the second $group can still be nested inside of the $lookup. This would make the final version of the $lookup stage structured as follows:
{
$lookup: {
from: "foreign",
localField: "resources_ids",
foreignField: "resource_id",
pipeline: [
{
"$group": {
"_id": {},
"total_cost": {
"$sum": "$cost"
}
}
}
],
as: "result"
}
}
Playground demonstration of that component is here.
The second thing is that using an index to perform the $lookups is important and will likely improve performance, but it may not make this operation "fast". As written, this aggregation will perform a full collection scan to process all of the documents in the source collection. (You will still see that COLLSCAN in the explain output from this source collection even if the index in the other collection is used for the $lookup).
Finally, the index on the other collection should probably be { resource_id: 1, cost: 1 }. This should allow the database to cover the query when doing the $lookups and avoid fetching those documents altogether.
Edit to address this comment:
$expr is not required inside the match. I've done this already. However, here because of the array I am building through the pipeline it doesn't let me use it in the match without an $expr.
This is not correct. Specifically the source of the array isn't relevant here. Whether the array is a field in the source document directly or generated in an earlier pipeline stage doesn't matter to the $lookup stage at all. In fact, it won't even know where that array comes from, just that it is a field in the generated document that is passed to it.
Rather, what you are describing is the behavior of $match itself. From the documentation:
$match takes a document that specifies the query conditions. The query syntax is identical to the read operation query syntax; i.e. $match does not accept raw aggregation expressions. Instead, use a $expr query expression to include aggregation expression in $match.
Said another way, you presently cannot reference any fields from the document (regardless of where they come from or what type and value they have) without $expr.
But that fact should mostly be irrelevant for your use case. You can use the localField/foreignField syntax for this array matching. If you need to match on additional filters then you can also leverage the let/pipeline syntax in the same $lookup. Here is an arbitrary demonstration of that (note the _id: 4 document doesn't match due to the mismatched otherVal).
It is also worth noting that $expr itself does not preclude the usage of indexes in general. One current exception, unfortunately, seems to be with $in (reference). Again though, that shouldn't matter for you if you place that part of the $lookup matching into the localField/foreignField parameters.

Related

Aggregate method in MongoDB Compass?

as stated in the title i'm having some problems querying from MongoDB Compass using the aggregate methhod. I have a collection of documents in this form:
{"Array":[{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},...]}
using mongo shell or Studio 3T software I query it with aggregate method, follows an example:
db.collection.aggregate([
{ $match: {"Array.field": "val"}},
{ $unwind: "$Array"},
{ $match: {"Array.field": "val"}},
{ $group: {_id: null, count: {$sum:NumberInt(1)}, Array: {$push: "$Array"}}},
{ $project: {"N. Hits": "$count", Array:1}}
])
where I look for elements of Array who has field's value = "val" and count them. This works perfectly, but I don't know how to do the same in MongoDB Compass
in the query bar I have 'filter', 'project' and 'sort' and I can do usual queries, but i don't know how to use aggregate method.
Thanks

You are looking at the Documents tab which is restricted for querying documents.
Take a look in the second tab called Aggregations where you can do your aggregation pipelines, as usual.
For further information please visit the Aggregation Pipeline Builder documentation.

The $in operator inside $project, $match or find()

I have a find query that uses $in to check whether the specified array is contained within the collection string array:
db.Doc.find({ tags: { '$in': ['tag1','tag2'] } })
I am in the process of refactoring this query to use the aggregation framework, but I can't find the equivalent $in comparison operator at the $project or $match aggregation stages.
Is it possible to use the $in comparison operator at the $project or $match stages of an aggregation query.

To answer your question: yes, but not as you would expect. It is possible to use the $in operator at the $project or $match stages of an aggregation query, but the usage and the purpose aren't quite the same in each.
There are two extremely different types of the "same" $in operator (making a semantic confusion):
Non-aggregational $in: Usually narrows down the results, like a filter. It has no way to add information to the result set, if it doesn't match. Can be used both within find() collection method and inside the aggregational (quite confusing semantic ah?) $match.
Aggregational $in: Usually adds boolean information to the result set, can be used as a logic expression inside $cond, and might also remove some results when is used with $redact. Can be used in $project, $addFields, etc. (but cannot (!) be used within find() or $match). The structure is: { $in: [ <needle expression>, <array haystack expression> ] }, and all of this grey line becomes either true or false (I used PHP's documentation's in_array needle-heystack semantic to better explain). So, { $in [ 'foo', [ 'foo', 'bar', 'baz' ] ] } is true because foo is inside the array.
However, in the previous non-aggregational $in, the { maybeFooField: { $in: [ 'foo', 'bar', 'baz' ] } } structure query simply narrows down the result set, and it doesn't result in a boolean true or false.
Going back to your refactoring, the question is what are your intended results? Why did you switch to the aggregation framework from the beginning?
If you only want to narrow down or filter out the result set, and then use some other aggregation computations, use the simple non-aggregational $in operator.
db.Doc.aggregate([
{ $match: { tags: {$in: ['tag1','tag2'] } } } // non-aggregational $in
])
However, if you want to add information based on the existence or absence of certain tags, use the aggregational $in operator.
db.Doc.aggregate([
{ $project: { hasAnyTag: {$in: [$tags, ['tag1', 'tag2'] ] } } } // aggregational $in
])
Note, you have more aggregational operators to play with arrays, like: $setIntersection and $setIsSubset.

The query: db.Doc.find({ tags: { '$in': ['tag1','tag2'] } }) is equivalent to:
db.Doc.aggregate([
{$match:{tags: {$in: ['tag1','tag2'] }}}
])
And when u use $in at projection like below:
db.Doc.aggregate([
{$project:{tags: {$in: ['tag1','tag2'] }}}
])
Result will be tags:true or tags:false depending upon whether there's match or not.

$project multiple fields into one field

I'm working with MongoDB 3.0 (we won't be upgrading until next year.) I have a requirement to get a list of unique values across multiple fields in a collection. The fields have the same value most of the time. This can be accomplished in version 3.2 by something like this:
db.mydata.aggregate([
{'$project': {'combined_users': ['$user1', '$user2']}},
{'$unwind': '$combined_users'},
{'$group': {_id: 1, {$addToSet: '$combined_users'}}}
The issue is in version 3.0 we get "disallowed field type Array in..." at the combined_data.
How do I accomplish the same thing in Mongo 3.0?

You need to use the $setUnion operator
db.mydata.aggregate([
{'$project': { 'combined_users': { "$setUnion": ['$user1', '$user2'] }}}
])

How to return documents where two fields have same value [duplicate]

This question already has answers here:
MongoDb query condition on comparing 2 fields
(4 answers)
Closed 3 years ago.
Is it possible to find only those documents in a collections with same value in two given fields?
{
_id: 'fewSFDewvfG20df',
start: 10,
end: 10
}
As here start and end have the same value, this document would be selected.
I think about something like...
Collection.find({ start: { $eq: end } })
... which wouldn't work, as end has to be a value.

You can use $expr in mongodb 3.6 to match the two fields from the same document.
db.collection.find({ "$expr": { "$eq": ["$start", "$end"] } })
or with aggregation
db.collection.aggregate([
{ "$match": { "$expr": { "$eq": ["$start", "$end"] }}}
])

You have two options here. The first one is to use the $where operator.
Collection.find( { $where: "this.start === this.end" } )
The second option is to use the aggregation framework and the $redact operator.
Collection.aggregate([
{ "$redact": {
"$cond": [
{ "$eq": [ "$start", "$end" ] },
"$$KEEP",
"$$PRUNE"
]
}}
])
Which one is better?
The $where operator does a JavaScript evaluation and can't take advantage of indexes so query using $where can cause a drop of performance in your application. See considerations. If you use $where each of your document will be converted from BSON to JavaScript object before the $where operation which, will cause a drop of performance. Of course your query can be improved if you have an index filter. Also There is security risk if you're building your query dynamically base on user input.
The $redact like the $where doesn't use indexes and even perform a collection scan, but your query performance improves when you $redact because it is a standard MongoDB operators. That being said the aggregation option is far better because you can always filter your document using the $match operator.
$where here is fine but could be avoided. Also I believe that you only need $where when you have a schema design problem. For example adding another boolean field to the document with index can be a good option here.

this query is fast, since least function calls are involved,
Collection.find("this.start == this.end");

How to use $elemMatch on aggregate's projection?

This is my object:
{ "_id" : ObjectId("53fdcb6796cb9b9aa86f05b9"), "list" : [ "a", "b" ], "complist" : [ { "a" : "a", "b" : "b" }, { "a" : "c", "b" : "d" } ] }
And this is what I want to accomplish: check if "list" contains a certain element and get only the field "a" from the objects on "complist" while reading the document regardless of any of these values. I'm building a forum system, this is the query that will return the details of a forum. I need to read the forum information while knowing if the user is in the forum's white list.
With a find I can use the query
db.itens.find({},{list:{$elemMatch:{$in:["a"]}}})
to get only the first element that matches a certain value. This way I can just check if the returned array is not empty and I know if "list" contains the value I'm looking for. I can't do it on the query because I want the document regardless of it containing the value I'm looking for in the "list" value. I need the document AND know if "list" has a certain value.
With an aggregate I can use the query
db.itens.aggregate({$project:{"complist.a":1}})
to read only the field "a" of the objects contained in complist. This is going to get the forum's threads basic information, I don't want all the information of the threads, just a couple of things.
But when I try to use the query
db.itens.aggregate({$project:{"complist.b":1,list:{$elemMatch:{$in:["a"]}}}})
to try and do both, it throws me an error saying the operator $elemMatch is not valid.
Am I doing something wrong here with the $elemMatch in aggregate? Is there a better way to accomplish this?

Quite on old question but literally none of the proposed answers are good.
TLDR:
You can't use $elemMatch in a $project stage. but you can achieve the same result using other aggregation operators like $filter.
db.itens.aggregate([
{
$project: {
compList: {
$filter: {
input: "$complist",
as: "item",
cond: {$eq: ["$$item.a", 1]}
}
}
}
}
])
And if you want just the first item from the array that matches the condition similarly to what $elemMatch does you can incorporate $arrayElemAt
In Depth Explanation:
First let's understand $elemMatch:
$elemMatch is a query expressions while also this projection version of it exists this refers to a query projection and not $project aggregation stage.
So what? what does this have to do with anything? well a $project stage has certain input structure it can have while the one we want to use is:
<field>: <expression>
What is a valid expression?
Expressions can include field paths, literals, system variables, expression objects, and expression operators. Expressions can be nested.
So we want to use an expression operator, but as you can see from the doc's $elemMatch is not part of it. hence it's not a valid expression to be used in an aggregation $project stage.

For some reason $elemMatch doesn't work in aggregations. You need to use the new $filter operator in Mongo 3.2. See https://docs.mongodb.org/manual/reference/operator/aggregation/filter/

The answer to this question maybe help.
db.collection_name.aggregate({
"$match": {
"complist": {
"$elemMatch": {
"a": "a"
}
}
}
});

Actually, the simplest solution is to just $unwind your array, then $match the appropriate documents. You can wind-up the appropriate documents again using $group and $push.

Although the question is old, here is my contribution for November 2017.
I had similar problem and doing two consecutive match operations worked for me. The code below is a subset of my whole code and I changed elements names, so it's not tested. Anyway this should point you in the right direction.
db.collection.aggregate([
{
"$match": {
"_id": "ID1"
}
},
{
"$unwind": "$sub_collection"
},
{
"$match": {
"sub_collection.field_I_want_to_match": "value"
}
}
])

For aggregations simply use $expr:
db.items.aggregate([
{
"$match": {
"$expr": {"$in": ["a", "$list"]}
}
},
])

Well, it happens you can use "array.field" on a find's projection block.
db.itens.find({},{"complist.b":1,list:{$elemMatch:{$in:["a"]}}})
did what I needed.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse