I want to know if there is any way of skipping stages on the aggregation pipeline, more concretely, stop and return if one of the $lookup stages find a matach.
I need a query for retrieving "inherited" data from other types and/or groups. In this case I have three different tables: devices_properties, types_properties, and group_properties, where are stored properties for each device, type, or group.
If a device has a property defined, i.e., geofences, it can be read directly from devices_properties, if not , it is necessary to check its type and/or its group to see if it is defined there. If it is found on its type, then it is not necessary to check in the group.
I have a query that works by checking its type/group, and doing a $lookup over the different tables. Then, with a switch, it returns the appropriate document. However, it is not optimal, as many times the property will be located on the first table: devices_properties. In such case, it does 3 unnecessary lookups, as it is not required to check for device type and group, and check for their respective properties. Not sure I explained it correctly.
The query I have right know is the following. Any way to optimize it? i.e., stop after the first $lookup if there is match?.
db.devices.aggregate([
{"$match" : { "_id": "alvarolb#esp32"}},
{"$project" : {
"_id": false,
"asset_group": {"$concat" : ["alvarolb", "#", "$asset_group", ":", "geofences"]},
"asset_type": {"$concat" : ["alvarolb", "#", "$asset_type", ":", "geofences"]}
}},
{"$lookup" : {
"from": "devices_properties",
"pipeline": [
{"$match" : {"_id": "alvarolb#esp32:geofences"}},
],
"as": "device"
}},
{ "$unwind": {
"path": "$device",
"preserveNullAndEmptyArrays": true
}},
{"$lookup" : {
"from": "groups_properties",
"let" : {"asset_group" : "$asset_group"},
"pipeline": [
{"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_group"]}}}
],
"as": "group"
}},
{ "$unwind": {
"path": "$group",
"preserveNullAndEmptyArrays": true
}},
{"$lookup" : {
"from": "types_properties",
"let" : {"asset_type" : "$asset_type"},
"pipeline": [
{"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_type"]}}}
],
"as": "type"
}},
{ "$unwind": {
"path": "$type",
"preserveNullAndEmptyArrays": true
}},
{"$project" : {
"value": {
"$switch" : {
"branches" : [
{"case": "$device", "then" : "$device"},
{"case": "$type", "then" : "$type"},
{"case": "$group", "then" : "$group"}
],
"default": {}
}
}
}},
{"$replaceRoot": { "newRoot": "$value"}}
]);
Thanks!
I doubt this particular query requires optimisation but conditional stages in aggregation pipeline in general is an interesting question.
So first thing first, on the first stage you select at most 1 document by indexed field which is already quite optimal. All your lookups do the same so we are talking about magnitude of few dozen millis for the whole pipeline even on large collections. Is it worth optimising?
For more generic case when lookups are indeed expensive you can employ a combination of $facet to run conditional pipelines and $concatArrays to merge the results.
The first lookup remains as is:
db.devices.aggregate([
....
{"$lookup" : {
"from": "devices_properties",
"pipeline": [
{"$match" : {"_id": "alvarolb#esp32:geofences"}},
],
"as": "device"
}},
Then we add an indicator whether it returned any result so we need no more lookups:
{$addFields:{found: {$size: "$device"}}},
Then we define 2 pipelines in the facet: one with next lookup, another without. The switch which one to run is the first $match stage in each pipeline:
{$facet:{
yes:[
{$match: {"$expr" : {$gt:["$found", 0]}}},
],
no:[
{$match: {"$expr" : {$eq:["$found", 0]}}},
{"$lookup" : {
"from": "groups_properties",
"let" : {"asset_group" : "$asset_group"},
"pipeline": [
{"$match" : {"$expr" : { "$eq" : ["$_id", "$$asset_group"]}}}
],
"as": "group"
}}
]
}},
after this stage we have 2 arrays "yes" and "no", one of them is always empty. Merge both and convert to top-level documents:
{$addFields: {yesno: {$concatArrays:["$yes", "$no"]}}},
{$unwind: "$yesno"},
{"$replaceRoot": { "newRoot": "$yesno"}},
recalculate the indicator if we have found anything so far:
{$addFields:{found: {$add: [ "$found", {$size: {$ifNull:["$group", []]}}]}}},
and repeat the same technique for the next lookup:
$facet with $lookup in `groups_properties`
$addFields with $concatArrays
$unwind
$replaceRoot
then do you types_properties in the similar fashion and finalise it projection/replace root as in the original pipeline.
Related
The query returns the order in which elements are placed in their collection, ignoring the order of the initial array. This affects the function of our system. Is there any extra command to put it in the correct order? Is there any workaround available?
Here follows a simple example:
Collection1 Document
{
"_id":ObjectId("5c781752176c512f180048e3"),
"Name":"Pedro",
"Classes":[
{"ID": ObjectId("5c7af2b2f6f6e47c9060d7ce") },
{"ID": ObjectId("5c7af2bcf6f6e47c9060d7cf") },
{"ID": ObjectId("5c7af2aaf6f6e47c9060d7cd") }
]
}
Collection2 Documents
{
"_id":ObjectId("5c7af2aaf6f6e47c9060d7cd"),
"variable1":"A"
},
{
"_id": ObjectId("5c7af2b2f6f6e47c9060d7ce"),
"variable1":"B"
},
{
"_id": ObjectId("5c7af2bcf6f6e47c9060d7cf"),
"variable1":"C"
}
The query:
aggregate(
pipeline = '[
{"$match": {"_id": {"$oid": "5c781752176c512f180048e3"}}},
{"$lookup": {"from": "collection2", "localField": "Classes.ID", "foreignField": "_id", "as": "Collection2_doc"}}
]'
)
Returns:
Result's order:
[
{
"_id":ObjectId("5c7af2aaf6f6e47c9060d7cd"),
"variable1":"A"
},
{
"_id": ObjectId("5c7af2b2f6f6e47c9060d7ce"),
"variable1":"B"
},
{
"_id": ObjectId("5c7af2bcf6f6e47c9060d7cf"),
"variable1":"C"
}
]
Expected order (first document array order):
[
{
"_id": ObjectId("5c7af2b2f6f6e47c9060d7ce"),
"variable1":"B"
},
{
"_id": ObjectId("5c7af2bcf6f6e47c9060d7cf"),
"variable1":"C"
},
{
"_id":ObjectId("5c7af2aaf6f6e47c9060d7cd"),
"variable1":"A"
}
]
Are there any extra command ex. $sort that could be used to return it respecting the original arrays order?
This is "by design" of the $lookup implementation. What actually happens "under the hood" is MongoDB internall converts the arguments in the $lookup to the new expressive format using $expr and $in. Even in versions prior to when this expressive form was implemented, the internal mechanics for an "array of values" was really much the same.
The solution here is to maintain a copy of the original array as a reference for reordering the "joined" items:
collection.aggregate([
{"$match": {"_id": ObjectId("5c781752176c512f180048e3") }},
{"$lookup": {
"from": "collection2",
"let": { "classIds": "$Classes.ID" },
"pipeline": [
{ "$match": {
"$expr": { "$in": [ "$_id", "$$classIds" ] }
}},
{ "$addFields": {
"sort": {
"$indexOfArray": [ "$$classIds", "$_id" ]
}
}},
{ "$sort": { "sort": 1 } },
{ "$addFields": { "sort": "$$REMOVE" }}
],
"as": "results"
}}
])
Or by the legacy $lookup usage:
collection.aggregate([
{"$match": {"_id": ObjectId("5c781752176c512f180048e3") }},
{"$lookup": {
"from": "collection2",
"localField": "Classes.ID",
"foreignField": "_id",
"as": "results"
}},
{ "$unwind": "$results" },
{ "$addFields": {
"sort": {
"$indexOfArray": [ "$Classes.ID", "$results._id" ]
}
}},
{ "$sort": { "_id": 1, "sort": 1 } },
{ "$group": {
"_id": "$_id",
"Name": { "$first": "$Name" },
"Classes": { "$first": "$Classes" },
"results": { "$push": "$results" }
}}
])
Both variants produce the same output:
{
"_id" : ObjectId("5c781752176c512f180048e3"),
"Name" : "Pedro",
"Classes" : [
{
"ID" : ObjectId("5c7af2b2f6f6e47c9060d7ce")
},
{
"ID" : ObjectId("5c7af2bcf6f6e47c9060d7cf")
},
{
"ID" : ObjectId("5c7af2aaf6f6e47c9060d7cd")
}
],
"results" : [
{
"_id" : ObjectId("5c7af2b2f6f6e47c9060d7ce"),
"variable1" : "B"
},
{
"_id" : ObjectId("5c7af2bcf6f6e47c9060d7cf"),
"variable1" : "C"
},
{
"_id" : ObjectId("5c7af2aaf6f6e47c9060d7cd"),
"variable1" : "A"
}
]
}
The general concept being to use $indexOfArray in comparison with the _id value from the "joined" content to find it's "index" position in the original source array from "$Classes.ID". The different $lookup syntax variants have different approaches to how you access this copy and how you basically reconstruct.
The $sort of course sets the order of actual documents, either being inside the pipeline processing for the expressive form, or via the exposed documents of $unwind. Where you used $unwind you would then $group back to the original document form.
NOTE: The usage examples here depend on MongoDB 3.4 for the $indexOfArray at least and the $$REMOVE aligns with MongoDB 3.6 as would the expressive $lookup.
There are other approaches to re-ordering the array for prior releases, but these are demonstrated in more detail on Does MongoDB's $in clause guarantee order. Realistically the bare minimum you should presently be running as a production MongoDB version is the 3.4 release.
See Support Policy under MongoDB Server for the full details of supported releases and end dates.
I am trying to understand why a $lookup I'm using in my MongoDB aggregation is producing the result it is.
First off, my initial data looks like this:
"subscriptions": [
{
"agency": "3dg2672f145d0598be095634", // This is an ObjectId
"memberType": "primary"
}
]
Now, what I want to do is a simple $lookup, pulling in the related data for the ObjectId that's currently being populated as the value to the "agency" field.
What I tried doing was a $lookup like this:
{
"from" : "agencies",
"localField" : "subscriptions.0.agency",
"foreignField" : "_id",
"as" : "subscriptions.0.agency"
}
So, basically what I want to do is go get that info related to that ObjectId ref, and populate it right here, in place of where the ObjectId currently resides.
What I'd expect as a result is something like this:
"subscriptions": [
{
"agency": [
{
_id: <id-value>,
name: <name-value>,
address: <address-value>
}
],
"memberType": "primary"
}
]
Instead, I end up with this (with my "memberType" prop now nowhere to be found):
"subscriptions" : {
"0" : {
"agency" : [ <agency-data> ]
}
}
Why is this the result of the $lookup, and how can I get the data structure I'm looking for here?
To clarify further, in the docs they mention using an $unwind BEFORE the $lookup when it's an array field. But in this case, the actual local field being targeted and replaced by the $lookup is NOT an array, but it is within an array. So I'm not clear on what the problem is.
You need to use $unwind to match your "localField" with to the "foreignField" and then $group to rollback again to the array
db.collection.aggregate([
{ "$unwind": "$subsciption" },
{ "$lookup": {
"from": Agency.collection.name,
"localField": "subsciption.agency",
"foreignField": "_id",
"as": "subsciption.agency"
}},
{ "$group": {
"_id": "$_id",
"memberType": { "$first": "$memberType" },
"subsciption": { "$push": "$subsciption" },
}}
])
Basically, what OP is looking for is to transform data in desired format after looking up into another collection. Assuming there are two collections C1 and C2 where C1 contains document
{ "_id" : ObjectId("5b50b8ebfd2b5637081105c6"), "subscriptions" : [ { "agency" : "3dg", "memberyType" : "primary" } ] }
and C2 contains
{ "_id" : ObjectId("5b50b984fd2b5637081105c8"), "agency" : "3dg", "name" : "ABC", "address" : "1 some street" }
if following query is executed against database
db.C1.aggregate([
{$unwind: "$subscriptions"},
{
$lookup: {
from: "C2",
localField: "subscriptions.agency",
foreignField: "agency",
as: "subscriptions.agency"
}
}
])
We get result
{
"_id": ObjectId("5b50b8ebfd2b5637081105c6"),
"subscriptions": {
"agency": [{
"_id": ObjectId("5b50b984fd2b5637081105c8"),
"agency": "3dg",
"name": "ABC",
"address": "1 some street"
}],
"memberyType": "primary"
}
}
which is pretty close to what OP is looking forward.
Note: there may be some edge cases but with minor tweaks, this solution should work
I am trying to perform $lookup on collection with conditions, the problem I am facing is that I would like to match the text field of all objects which are inside an array (accounts array) in other (plates) collection.
I have tried using $map as well as $in and $setIntersection but nothing seems to work. And, I am unable to find a way to match the text fields of each of the objects in array.
My document structures are as follows:
plates collection:
{
"_id": "Batch 1",
"rego" : "1QX-WA-123",
"date" : 1516374000000.0
"accounts": [{
"text": "Acc1",
"date": 1516374000000
},{
"text": "Acc2",
"date": 1516474000000
}]
}
accounts collection:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0
}
I am trying to achieve something like this:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$eq': [ '$account.text', '$$accountId' ] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
}
}],
as: 'cusips'
}
},
The output I am trying to get is:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0,
"plates": [{
"_id": "Batch 1",
"rego": "1QX-WA-123"
}]
}
Personally I would be initiating the aggregation from the "plates" collection instead where the initial $match conditions can filter the date range more cleanly. Getting your desired output is then a simple matter of "unwinding" the resulting "accounts" matches and "inverting" the content.
Easy enough with MongoDB 3.6 features which you must have in order to use $lookup with $expr. We even don't need that form for $lookup here:
db.plates.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
}
}},
{ "$lookup": {
"from": "accounts",
"localField": "accounts.text",
"foreignField": "_id",
"as": "accounts"
}},
{ "$unwind": "$accounts" },
{ "$group": {
"_id": "$accounts",
"plates": { "$push": { "_id": "$_id", "rego": "$rego" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": ["$_id", { "plates": "$plates" }]
}
}}
])
This of course is an "INNER JOIN" which would only return "accounts" entries where the matc
Doing the "join" from the "accounts" collection means you need additional handling to remove the non-matching entries from the "accounts" array within the "plates" collection:
db.accounts.aggregate([
{ "$lookup": {
"from": "plates",
"let": { "account": "$_id" },
"pipeline": [
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
},
"$expr": { "$in": [ "$$account", "$accounts.text" ] }
}},
{ "$project": { "_id": 1, "rego": 1 } }
],
"as": "plates"
}}
])
Note that the $match on the "date" properties should be expressed as a regular query condition instead of within the $expr block for optimal performance of the query.
The $in is used to compare the "array" of "$accounts.text" values to the local variable defined for the "_id" value of the "accounts" document being joined to. So the first argument to $in is the "single" value and the second is the "array" of just the "text" values which should be matching.
This is also notably a "LEFT JOIN" which returns all "accounts" regardless of whether there are any matching "plates" to the conditions, and therefore you can possibly end up with an empty "plates" array in the results returned. You can filter those out if you didn't want them, but where that was the case the former query form is really far more efficient than this one since the relation is defined and we only ever deal with "plates" which would meet the criteria.
Either method returns the same response from the data provided in the question:
{
"_id" : "Acc1",
"date" : 1516374000000,
"createdAt" : 1513810712802,
"plates" : [
{
"_id" : "Batch 1",
"rego" : "1QX-WA-123"
}
]
}
Which direction you actually take that from really depends on whether the "LEFT" or "INNER" join form is what you really want and also where the most efficient query conditions can be made for the items you actually want to select.
Hmm, not sure how you tried $in, but it works for me:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$in': [ '$$accountId', '$accounts.text'] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
},
}],
as: 'cusips'
}
}
In mongodb I found a strange behavior of $or, think below collection:
{ "_id" : 1, "to" : [ { "_id" : 2 }, { "_id" : 4, "valid" : true } ] }
When aggregate with $match:
db.ooo.aggregate([{$match:{ $or: ['$to', '$valid'] }}])
Will throw error with aggregate failed: (sure, I know how to fix it...)
"ok" : 0,
"errmsg" : "$or/$and/$nor entries need to be full objects",
"code" : 2,
"codeName" : "BadValue"
But If the $or used in a $cond statement:
db.ooo.aggregate([{ "$redact": {
"$cond": {
"if": { $or: ["$to", "$valid"] },
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}])
The result will shown and no error thrown, see mongodb aggregate $redact to filter array elements
The question is what's going on with the $or syntax? why the same condition not work in $match but did work in $cond?
Also I'm looked up the docs:
$cond
If the evaluates to true, then $cond evaluates and returns the value of the expression. Otherwise, $cond evaluates and returns the value of the expression.
The arguments can be any valid expression. For more information on expressions, see Expressions.
$or
Evaluates one or more expressions and returns true if any of the expressions are true. Otherwise, $or returns false.
For more information on expressions, see Expressions.
PS: I'm using mongodb 3.4.5 but not tested on other version.
I don't have a clue...
UPDATE
Based on the answer of #Neil, I'm also tried the $filter usage with $or:
1.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": {$or: ["$$el.valid"]}
}
}
}}])
2.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": {$or: "$$el.valid"}
}
}
}}])
3.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": "$$el.valid"
}
}
}}])
All the above 3 $filter, the syntax are ok, the result is shown and no error thrown.
Seems $or will work with field names directly only in $cond or cond?
Or this is a hacking usage of $or?
The $redact pipeline stage is the is the wrong thing for this type of operation. Instead use $filter which actually "filters" things from arrays:
db.ooo.aggregate([
{ "$addFields": {
"to": {
"$filter": {
"input": "$to",
"as": "t",
"cond": { "$ifNull": [ "$$t.valid", false ] }
}
}
}}
])
Produces:
{
"_id" : ObjectId("594c5b0a212a102096cebf7e"),
"id" : 1,
"to" : [
{
"id" : 4,
"valid" : true
}
]
}
The problem with $redact in this case is the only way you can actually "redact" from an array is by using $$DESCEND on the false condition. This is recursive and evaluates the expression from the top level of the document downwards. At whatever level where the condition is not met, $redact will discard it. No "valid"field in the "top-level" means it would discard the whole document, unless we gave an alternate condition.
Since not all array elements have the "valid" field in "addition" to the top level of the document, we cannot even "hack it" to pretend something is there.
For example you appear to be trying to do this:
db.ooo.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$ifNull": [ "$$ROOT.to", false ] },
"$valid"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
But when you look carefully, that kind of comparison will essentially "always" evaluate to true, no matter how hard you tried to hack a condition.
You could do:
db.ooo.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$ifNull": [ "$to", false ] },
"$valid"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
Which does correctly remove the element from the array:
{
"_id" : ObjectId("594c5b0a212a102096cebf7e"),
"id" : 1,
"to" : [
{
"id" : 4,
"valid" : true
}
]
}
But it is overkill and adds unnecessary overhead to logic processing when the simple $filter will do in this case. You should only need this form when there are actually "nested" arrays that need to be recursively processed, and all conditions can actually be met at all levels.
The lesson here is to use the correct operators for their designed purpose.
I am new to MongoDB and need help in accomplishing my task:
I am using MongoDB to query for actions that were taken by a person. The actions are embedded in the person document like this:
{
"_id" : ObjectId("56447ac0583d4871570041c3"),
"email" : "email#example.net",
"actions" : [
{
"name" : "support",
"created_at" : ISODate("2015-10-17T01:40:35.000Z"),
},
{
"name" : "hide",
"created_at" : ISODate("2015-10-16T01:40:35.000Z")
},
{
"name" : "support",
"created_at" : ISODate("2015-10-17T03:40:35.000Z"),
}
]
}
A person can have many actions with different action names (support and hide are just 2 examples).
I know that I could find all people with at least one support action like this:
db.test.find({'actions.name':'support'})
What I want to do, is, retrieve all people with at least X support actions. Is this possible without using javascript syntax? As people could have hundreds of actions, this would be slow.
So, If I want all people with at least 2 support actions, the only way I know would be using the js syntax:
db.test.find({$where: function() {
return this.actions.filter(function(action){
return action.name = 'support';
}).length >= 2;
}});
Is there an other/better/faster possibility for this query?
Well the best way to do this is using the the .aggregate() method which provides access to the aggregation pipelines.
You can reduce the size of documents to process on the pipeline using $match operator to filter out all documents that don't match the given criteria.
You need to use the $redact operator to return only documents where the numbers of elements that with name "support" in your array is $gte 2. The $map operator here return an array of subdocuments that match your critera and false that you can easily drop using the $setDifference operator. Of course the $size operator returns the size of the array.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$setDifference": [
{ "$map": {
"input": "$actions",
"as": "action",
"in": {
"$cond": [
{ "$eq": [ "$$action.name", "support" ] },
"$$action",
false
]
}
}},
[false]
]
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
From MongoDB 3.2 this can be handled using the $filter operator.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$filter": {
"input": "$actions",
"as": "action",
"cond": { "$eq": [ "$$action.name", "support" ] }
}
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
As #BlakesSeven pointed out:
$setDifference is fine as long as the data being filtered is "unique". In this case it "should" be fine, but if any two results contained the same date then it would skew results by considering the two to be one. $filter is the better option when it comes, but if data was not unique it would be necessary to $unwind at present.
I haven't benchmarked this against your attempt, but this sounds like a great usecase for Mongo's aggregation framework.
db.test.aggregate([
{$unwind: "$actions"},
{$group: {
_id: { _id: "$_id", action: "$actions},
count: {$sum: 1}
},
{$match: {$and: [{count: {$gt: 2}}, {"_id.action": "support"]}
]);
Note that I havent run this in mongo, so it might have some syntax issues.
The idea behind it is:
unwind the actions array so each element of the array becomes its own document
group the resulting collection by an _id - action type pair, and count how much we get of each.
match will filter for only things we are interested in.