mongodb $or syntax in aggregation pipeline - mongodb

In mongodb I found a strange behavior of $or, think below collection:
{ "_id" : 1, "to" : [ { "_id" : 2 }, { "_id" : 4, "valid" : true } ] }
When aggregate with $match:
db.ooo.aggregate([{$match:{ $or: ['$to', '$valid'] }}])
Will throw error with aggregate failed: (sure, I know how to fix it...)
"ok" : 0,
"errmsg" : "$or/$and/$nor entries need to be full objects",
"code" : 2,
"codeName" : "BadValue"
But If the $or used in a $cond statement:
db.ooo.aggregate([{ "$redact": {
"$cond": {
"if": { $or: ["$to", "$valid"] },
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}])
The result will shown and no error thrown, see mongodb aggregate $redact to filter array elements
The question is what's going on with the $or syntax? why the same condition not work in $match but did work in $cond?
Also I'm looked up the docs:
$cond
If the evaluates to true, then $cond evaluates and returns the value of the expression. Otherwise, $cond evaluates and returns the value of the expression.
The arguments can be any valid expression. For more information on expressions, see Expressions.
$or
Evaluates one or more expressions and returns true if any of the expressions are true. Otherwise, $or returns false.
For more information on expressions, see Expressions.
PS: I'm using mongodb 3.4.5 but not tested on other version.
I don't have a clue...
UPDATE
Based on the answer of #Neil, I'm also tried the $filter usage with $or:
1.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": {$or: ["$$el.valid"]}
}
}
}}])
2.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": {$or: "$$el.valid"}
}
}
}}])
3.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": "$$el.valid"
}
}
}}])
All the above 3 $filter, the syntax are ok, the result is shown and no error thrown.
Seems $or will work with field names directly only in $cond or cond?
Or this is a hacking usage of $or?

The $redact pipeline stage is the is the wrong thing for this type of operation. Instead use $filter which actually "filters" things from arrays:
db.ooo.aggregate([
{ "$addFields": {
"to": {
"$filter": {
"input": "$to",
"as": "t",
"cond": { "$ifNull": [ "$$t.valid", false ] }
}
}
}}
])
Produces:
{
"_id" : ObjectId("594c5b0a212a102096cebf7e"),
"id" : 1,
"to" : [
{
"id" : 4,
"valid" : true
}
]
}
The problem with $redact in this case is the only way you can actually "redact" from an array is by using $$DESCEND on the false condition. This is recursive and evaluates the expression from the top level of the document downwards. At whatever level where the condition is not met, $redact will discard it. No "valid"field in the "top-level" means it would discard the whole document, unless we gave an alternate condition.
Since not all array elements have the "valid" field in "addition" to the top level of the document, we cannot even "hack it" to pretend something is there.
For example you appear to be trying to do this:
db.ooo.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$ifNull": [ "$$ROOT.to", false ] },
"$valid"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
But when you look carefully, that kind of comparison will essentially "always" evaluate to true, no matter how hard you tried to hack a condition.
You could do:
db.ooo.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$ifNull": [ "$to", false ] },
"$valid"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
Which does correctly remove the element from the array:
{
"_id" : ObjectId("594c5b0a212a102096cebf7e"),
"id" : 1,
"to" : [
{
"id" : 4,
"valid" : true
}
]
}
But it is overkill and adds unnecessary overhead to logic processing when the simple $filter will do in this case. You should only need this form when there are actually "nested" arrays that need to be recursively processed, and all conditions can actually be met at all levels.
The lesson here is to use the correct operators for their designed purpose.

Related

MongoDB skip stages on pipeline?

I want to know if there is any way of skipping stages on the aggregation pipeline, more concretely, stop and return if one of the $lookup stages find a matach.
I need a query for retrieving "inherited" data from other types and/or groups. In this case I have three different tables: devices_properties, types_properties, and group_properties, where are stored properties for each device, type, or group.
If a device has a property defined, i.e., geofences, it can be read directly from devices_properties, if not , it is necessary to check its type and/or its group to see if it is defined there. If it is found on its type, then it is not necessary to check in the group.
I have a query that works by checking its type/group, and doing a $lookup over the different tables. Then, with a switch, it returns the appropriate document. However, it is not optimal, as many times the property will be located on the first table: devices_properties. In such case, it does 3 unnecessary lookups, as it is not required to check for device type and group, and check for their respective properties. Not sure I explained it correctly.
The query I have right know is the following. Any way to optimize it? i.e., stop after the first $lookup if there is match?.
db.devices.aggregate([
{"$match" : { "_id": "alvarolb#esp32"}},
{"$project" : {
"_id": false,
"asset_group": {"$concat" : ["alvarolb", "#", "$asset_group", ":", "geofences"]},
"asset_type": {"$concat" : ["alvarolb", "#", "$asset_type", ":", "geofences"]}
}},
{"$lookup" : {
"from": "devices_properties",
"pipeline": [
{"$match" : {"_id": "alvarolb#esp32:geofences"}},
],
"as": "device"
}},
{ "$unwind": {
"path": "$device",
"preserveNullAndEmptyArrays": true
}},
{"$lookup" : {
"from": "groups_properties",
"let" : {"asset_group" : "$asset_group"},
"pipeline": [
{"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_group"]}}}
],
"as": "group"
}},
{ "$unwind": {
"path": "$group",
"preserveNullAndEmptyArrays": true
}},
{"$lookup" : {
"from": "types_properties",
"let" : {"asset_type" : "$asset_type"},
"pipeline": [
{"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_type"]}}}
],
"as": "type"
}},
{ "$unwind": {
"path": "$type",
"preserveNullAndEmptyArrays": true
}},
{"$project" : {
"value": {
"$switch" : {
"branches" : [
{"case": "$device", "then" : "$device"},
{"case": "$type", "then" : "$type"},
{"case": "$group", "then" : "$group"}
],
"default": {}
}
}
}},
{"$replaceRoot": { "newRoot": "$value"}}
]);
Thanks!
I doubt this particular query requires optimisation but conditional stages in aggregation pipeline in general is an interesting question.
So first thing first, on the first stage you select at most 1 document by indexed field which is already quite optimal. All your lookups do the same so we are talking about magnitude of few dozen millis for the whole pipeline even on large collections. Is it worth optimising?
For more generic case when lookups are indeed expensive you can employ a combination of $facet to run conditional pipelines and $concatArrays to merge the results.
The first lookup remains as is:
db.devices.aggregate([
....
{"$lookup" : {
"from": "devices_properties",
"pipeline": [
{"$match" : {"_id": "alvarolb#esp32:geofences"}},
],
"as": "device"
}},
Then we add an indicator whether it returned any result so we need no more lookups:
{$addFields:{found: {$size: "$device"}}},
Then we define 2 pipelines in the facet: one with next lookup, another without. The switch which one to run is the first $match stage in each pipeline:
{$facet:{
yes:[
{$match: {"$expr" : {$gt:["$found", 0]}}},
],
no:[
{$match: {"$expr" : {$eq:["$found", 0]}}},
{"$lookup" : {
"from": "groups_properties",
"let" : {"asset_group" : "$asset_group"},
"pipeline": [
{"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_group"]}}}
],
"as": "group"
}}
]
}},
after this stage we have 2 arrays "yes" and "no", one of them is always empty. Merge both and convert to top-level documents:
{$addFields: {yesno: {$concatArrays:["$yes", "$no"]}}},
{$unwind: "$yesno"},
{"$replaceRoot": { "newRoot": "$yesno"}},
recalculate the indicator if we have found anything so far:
{$addFields:{found: {$add: [ "$found", {$size: {$ifNull:["$group", []]}}]}}},
and repeat the same technique for the next lookup:
$facet with $lookup in `groups_properties`
$addFields with $concatArrays
$unwind
$replaceRoot
then do you types_properties in the similar fashion and finalise it projection/replace root as in the original pipeline.

Compare embedded document to parent field with mongoDB

Consider the following collection, where the parent document has a amount field with the value 100000 and there's an embedded array of documents with the same field amount and the same value.
{
"_id" : ObjectId("5975ce5f05563b6303924914"),
"amount" : 100000,
"offers" : [
{
"amount": 100000
}
]
}
Is there any way to match all objects that has at least one embedded document offer with the same amount as the parent?
If I for example query this, it works just fine:
find({ offers: { $elemMatch: { loan_amount: 100000 } } })
But I don't know the actual value 100000 in the real query I'm trying to assemble, I would need to use a variable for the parent documents amount field. Something like this.
find({ offers: { $elemMatch: { loan_amount: "parent.loan_amount" } } })
Thankful for any suggestions. I was hoping to do this with $eq or $elemMatch, and to avoid aggregates, but maybe it's not possible.
Thanks!
Standard queries cannot "compare" values in documents. This is actually something you do using .aggregate() and $redact:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$size": {
"$filter": {
"input": "$offers",
"as": "o",
"cond": { "$eq": [ "$$o.amount", "$amount" ] }
}
}},
0
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Here we use $filter to compare the values of "amount" in the parent document to those within the array. If at least one is "equal" then we "$$KEEP" the document, otherwise we "$$PRUNE"
In most recent versions, we can shorten that using $indexOfArray.
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$ne": [
{ "$indexOfArray": [ "$offers.amount", "$amount" ] },
-1
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
If you actually only wanted the "matching array element(s)" as well, then you would add a $filter in projection:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$size": {
"$filter": {
"input": "$offers",
"as": "o",
"cond": { "$eq": [ "$$o.amount", "$amount" ] }
}
}},
0
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$project": {
"amount": 1,
"offers": {
"$filter": {
"input": "$offers",
"as": "o",
"cond": { "$eq": [ "$$o.amount", "$amount" ] }
}
}
}}
])
But the main principle is of course to "reduce" the number of documents returned to only those that actually match the condition as a "first" priority. Otherwise you are just doing unnecessary calculations and work that is taking time and resources, for results that you later would discard.
So "filter" first, and "reshape" second as a priority.
I think since MongoDB version 3.6 you can actually do this with a simple filter using the expr operator.
Something along those lines:
find({
$expr: {
$in: [
"$amount",
"$offers.amount"
]
}
})
See a live example on mongoplayground.net

Find Documents by the number of embedded array elements that match condition

I am new to MongoDB and need help in accomplishing my task:
I am using MongoDB to query for actions that were taken by a person. The actions are embedded in the person document like this:
{
"_id" : ObjectId("56447ac0583d4871570041c3"),
"email" : "email#example.net",
"actions" : [
{
"name" : "support",
"created_at" : ISODate("2015-10-17T01:40:35.000Z"),
},
{
"name" : "hide",
"created_at" : ISODate("2015-10-16T01:40:35.000Z")
},
{
"name" : "support",
"created_at" : ISODate("2015-10-17T03:40:35.000Z"),
}
]
}
A person can have many actions with different action names (support and hide are just 2 examples).
I know that I could find all people with at least one support action like this:
db.test.find({'actions.name':'support'})
What I want to do, is, retrieve all people with at least X support actions. Is this possible without using javascript syntax? As people could have hundreds of actions, this would be slow.
So, If I want all people with at least 2 support actions, the only way I know would be using the js syntax:
db.test.find({$where: function() {
return this.actions.filter(function(action){
return action.name = 'support';
}).length >= 2;
}});
Is there an other/better/faster possibility for this query?
Well the best way to do this is using the the .aggregate() method which provides access to the aggregation pipelines.
You can reduce the size of documents to process on the pipeline using $match operator to filter out all documents that don't match the given criteria.
You need to use the $redact operator to return only documents where the numbers of elements that with name "support" in your array is $gte 2. The $map operator here return an array of subdocuments that match your critera and false that you can easily drop using the $setDifference operator. Of course the $size operator returns the size of the array.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$setDifference": [
{ "$map": {
"input": "$actions",
"as": "action",
"in": {
"$cond": [
{ "$eq": [ "$$action.name", "support" ] },
"$$action",
false
]
}
}},
[false]
]
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
From MongoDB 3.2 this can be handled using the $filter operator.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$filter": {
"input": "$actions",
"as": "action",
"cond": { "$eq": [ "$$action.name", "support" ] }
}
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
As #BlakesSeven pointed out:
$setDifference is fine as long as the data being filtered is "unique". In this case it "should" be fine, but if any two results contained the same date then it would skew results by considering the two to be one. $filter is the better option when it comes, but if data was not unique it would be necessary to $unwind at present.
I haven't benchmarked this against your attempt, but this sounds like a great usecase for Mongo's aggregation framework.
db.test.aggregate([
{$unwind: "$actions"},
{$group: {
_id: { _id: "$_id", action: "$actions},
count: {$sum: 1}
},
{$match: {$and: [{count: {$gt: 2}}, {"_id.action": "support"]}
]);
Note that I havent run this in mongo, so it might have some syntax issues.
The idea behind it is:
unwind the actions array so each element of the array becomes its own document
group the resulting collection by an _id - action type pair, and count how much we get of each.
match will filter for only things we are interested in.

Aggregating number of fields that match true

I am struggling with an aggregation in mongodb. I have the following type of documents:
{
"_id": "xxxx",
"workHome": true,
"commute": true,
"tel": false,
"weekend": true,
"age":39
},
{
"_id": "yyyy",
"workHome": false,
"commute": true,
"tel": false,
"weekend": true,
"age":32
},
{
"_id": "zzzz",
"workHome": false,
"commute": false,
"tel": false,
"weekend": false,
"age":27
}
Out of this I want to generate an aggregation by the total number of fields that are "true" in the document. There are a total of 4 boolean fields in the document so I want the query to group them together to generate the following output (as examples from e.g. a collection with 100 documents in total):
0:20
1:30
2:10
3:20
4:20
This means: There is 20 documents out of 100 with 'all false', 30 documents with '1x true', 10 documents with '2x true' etc. up to a total of 'all 4 are true'.
Is there any way to do this with an $aggregate statement? Right now I am trying to $group by the $sum of 'true' values but don't find a way to get the conditional query to work.
So assuming that the data is consistent with all the same fields as "workHome", "commute", "tel" and "weekend", then you would proceed with a "logical" evaluation such as this:
db.collection.aggregate([
{ "$project": {
"mapped": { "$map": {
"input": ["A","B","C","D"],
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "A" ] },
"$workHome",
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
"$commute",
{ "$cond": [
{ "$eq": [ "$$el", "C" ] },
"$tel",
"$weekend"
]}
]}
]}
}}
}},
{ "$unwind": "$mapped" },
{ "$group": {
"_id": "$_id",
"size": { "$sum": { "$cond": [ "$mapped", 1, 0 ] } }
}},
{ "$group": {
"_id": "$size",
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": 1 } }
])
From your simple sample this gives:
{ "_id" : 0, "count" : 1 }
{ "_id" : 2, "count" : 1 }
{ "_id" : 3, "count" : 1 }
To break this down, first the $map operator here transposes the values of the fields to an array of the same lenght as the fields themselves. This is done my comparing each element of the "input" to an expected value via $cond and either returning the true condtion where a match, or moving on to the next condition embedded in the false part of this "ternary" operator. This is done until all logical matches are met and results in an array of values from the fields like so, for the first document:
[true,true,false,true]
The next step is to $unwind the array elements for further comparison. This "de-normalizes" into separate documents for each array element, and is usually required in aggregation pipelines when processing arrays.
Once that is done a $group pipeline stage is invoked, in order to assess the "total" of those elements with a true value. The same $cond ternary is used to transform the logical true/falsecondtions into numeric values here and fed to the $sum accumulator for addition.
Since the "grouping key" provided in _id in the $group is the original document _id value, the current totals are per document for those fields that are true. In order to get totals on the "counts" over the whole collection ( or selection ) then the futher $group stage is invoked with the grouping key being the returned "size" of the matched true results from each document.
The $sum accumulator used there simply adds 1 for each match on the grouping key, thus "counting" the number of occurances of each match count.
Finally $sort by the number of matches "key" in to produce some order to the results.
For the record, this is so much nicer with the upcoming release of MongoDB ( as of writing ) which includes the $filter operator:
db.collection.aggregate([
{ "$group": {
"_id": {
"$size": {
"$filter": {
"input": { "$map": {
"input": ["A","B","C","D"],
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "A" ] },
"$workHome",
{ "$cond": [
{ "$eq": [ "$$el", "B" ] },
"$commute",
{ "$cond": [
{ "$eq": [ "$$el", "C" ] },
"$tel",
"$weekend"
]}
]}
]}
}},
"as": "el",
"cond": {
"$eq": [ "$$el", true ]
}
}
}
},
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": 1 } }
])
So now just "two" pipeline stages doing the same thing as the original statement that will work from MongoDB 2.6 and above.
Therefore if your own application is in "development" itself, or you are otherwise curious, then take a look at the Development Branch releases where this functionality is available now.

Using $project to return an array

I have a collection with documents which look like this:
{
"campaignType" : 1,
"allowAccessControl" : true,
"userId" : "108028399"
}
I'd like to query this collection using aggregation framework and have a result which looks like this:
{
"campaignType" : ["APPLICATION"],
"allowAccessControl" : "true",
"userId" : "108028399",
}
You will notice that:
campaignType field becomes and array
the numeric value was mapped to a string
Can that be done using aggregation framework?
I tried looking at $addToSet and $push but had no luck.
Please help.
Thanks
In either case here it is th $cond operator from the aggregation framework that is your friend. It is a "ternary" operator, which means it evaluates a condition for true|false and then returns the result based on that evaluation.
So for modern versions from MongoDB 2.6 and upwards you can $project with usage of the $map operator to construct the array:
db.campaign.aggregate([
{ "$project": {
"campaignType": {
"$map": {
"input": { "$literal": [1] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
}
},
"allowAcessControl" : 1,
"userId": 1
}}
])
Or generally in most versions you can simply use the $push operator in a $group pipeline stage:
db.campaign.aggregate([
{ "$group": {
"_id": "$_id",
"campaignType": {
"$push": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
},
"allowAccessControl": { "$first": "$allowAccessControl" },
"userId": { "first": "$userId" }
}}
])
But the general concept if that you use "nested" expressions with the $cond operator in order to "test" and return some value that matches your "mapping" condition and do that with another operator that allows you to produce an array.