Find Documents by the number of embedded array elements that match condition - mongodb

I am new to MongoDB and need help in accomplishing my task:
I am using MongoDB to query for actions that were taken by a person. The actions are embedded in the person document like this:
{
"_id" : ObjectId("56447ac0583d4871570041c3"),
"email" : "email#example.net",
"actions" : [
{
"name" : "support",
"created_at" : ISODate("2015-10-17T01:40:35.000Z"),
},
{
"name" : "hide",
"created_at" : ISODate("2015-10-16T01:40:35.000Z")
},
{
"name" : "support",
"created_at" : ISODate("2015-10-17T03:40:35.000Z"),
}
]
}
A person can have many actions with different action names (support and hide are just 2 examples).
I know that I could find all people with at least one support action like this:
db.test.find({'actions.name':'support'})
What I want to do, is, retrieve all people with at least X support actions. Is this possible without using javascript syntax? As people could have hundreds of actions, this would be slow.
So, If I want all people with at least 2 support actions, the only way I know would be using the js syntax:
db.test.find({$where: function() {
return this.actions.filter(function(action){
return action.name = 'support';
}).length >= 2;
}});
Is there an other/better/faster possibility for this query?

Well the best way to do this is using the the .aggregate() method which provides access to the aggregation pipelines.
You can reduce the size of documents to process on the pipeline using $match operator to filter out all documents that don't match the given criteria.
You need to use the $redact operator to return only documents where the numbers of elements that with name "support" in your array is $gte 2. The $map operator here return an array of subdocuments that match your critera and false that you can easily drop using the $setDifference operator. Of course the $size operator returns the size of the array.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$setDifference": [
{ "$map": {
"input": "$actions",
"as": "action",
"in": {
"$cond": [
{ "$eq": [ "$$action.name", "support" ] },
"$$action",
false
]
}
}},
[false]
]
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
From MongoDB 3.2 this can be handled using the $filter operator.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$filter": {
"input": "$actions",
"as": "action",
"cond": { "$eq": [ "$$action.name", "support" ] }
}
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
As #BlakesSeven pointed out:
$setDifference is fine as long as the data being filtered is "unique". In this case it "should" be fine, but if any two results contained the same date then it would skew results by considering the two to be one. $filter is the better option when it comes, but if data was not unique it would be necessary to $unwind at present.

I haven't benchmarked this against your attempt, but this sounds like a great usecase for Mongo's aggregation framework.
db.test.aggregate([
{$unwind: "$actions"},
{$group: {
_id: { _id: "$_id", action: "$actions},
count: {$sum: 1}
},
{$match: {$and: [{count: {$gt: 2}}, {"_id.action": "support"]}
]);
Note that I havent run this in mongo, so it might have some syntax issues.
The idea behind it is:
unwind the actions array so each element of the array becomes its own document
group the resulting collection by an _id - action type pair, and count how much we get of each.
match will filter for only things we are interested in.

Related

mongodb $or syntax in aggregation pipeline

In mongodb I found a strange behavior of $or, think below collection:
{ "_id" : 1, "to" : [ { "_id" : 2 }, { "_id" : 4, "valid" : true } ] }
When aggregate with $match:
db.ooo.aggregate([{$match:{ $or: ['$to', '$valid'] }}])
Will throw error with aggregate failed: (sure, I know how to fix it...)
"ok" : 0,
"errmsg" : "$or/$and/$nor entries need to be full objects",
"code" : 2,
"codeName" : "BadValue"
But If the $or used in a $cond statement:
db.ooo.aggregate([{ "$redact": {
"$cond": {
"if": { $or: ["$to", "$valid"] },
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}])
The result will shown and no error thrown, see mongodb aggregate $redact to filter array elements
The question is what's going on with the $or syntax? why the same condition not work in $match but did work in $cond?
Also I'm looked up the docs:
$cond
If the evaluates to true, then $cond evaluates and returns the value of the expression. Otherwise, $cond evaluates and returns the value of the expression.
The arguments can be any valid expression. For more information on expressions, see Expressions.
$or
Evaluates one or more expressions and returns true if any of the expressions are true. Otherwise, $or returns false.
For more information on expressions, see Expressions.
PS: I'm using mongodb 3.4.5 but not tested on other version.
I don't have a clue...
UPDATE
Based on the answer of #Neil, I'm also tried the $filter usage with $or:
1.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": {$or: ["$$el.valid"]}
}
}
}}])
2.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": {$or: "$$el.valid"}
}
}
}}])
3.
db.ooo.aggregate([{ "$project": {
result:{
"$filter": {
"input": "$to",
"as": "el",
"cond": "$$el.valid"
}
}
}}])
All the above 3 $filter, the syntax are ok, the result is shown and no error thrown.
Seems $or will work with field names directly only in $cond or cond?
Or this is a hacking usage of $or?
The $redact pipeline stage is the is the wrong thing for this type of operation. Instead use $filter which actually "filters" things from arrays:
db.ooo.aggregate([
{ "$addFields": {
"to": {
"$filter": {
"input": "$to",
"as": "t",
"cond": { "$ifNull": [ "$$t.valid", false ] }
}
}
}}
])
Produces:
{
"_id" : ObjectId("594c5b0a212a102096cebf7e"),
"id" : 1,
"to" : [
{
"id" : 4,
"valid" : true
}
]
}
The problem with $redact in this case is the only way you can actually "redact" from an array is by using $$DESCEND on the false condition. This is recursive and evaluates the expression from the top level of the document downwards. At whatever level where the condition is not met, $redact will discard it. No "valid"field in the "top-level" means it would discard the whole document, unless we gave an alternate condition.
Since not all array elements have the "valid" field in "addition" to the top level of the document, we cannot even "hack it" to pretend something is there.
For example you appear to be trying to do this:
db.ooo.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$ifNull": [ "$$ROOT.to", false ] },
"$valid"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
But when you look carefully, that kind of comparison will essentially "always" evaluate to true, no matter how hard you tried to hack a condition.
You could do:
db.ooo.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$ifNull": [ "$to", false ] },
"$valid"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
Which does correctly remove the element from the array:
{
"_id" : ObjectId("594c5b0a212a102096cebf7e"),
"id" : 1,
"to" : [
{
"id" : 4,
"valid" : true
}
]
}
But it is overkill and adds unnecessary overhead to logic processing when the simple $filter will do in this case. You should only need this form when there are actually "nested" arrays that need to be recursively processed, and all conditions can actually be met at all levels.
The lesson here is to use the correct operators for their designed purpose.

Count Multiple Date Ranges in a Query

I have the following aggregate query which gives me counts (countA) for a given date range period. In this case 01/01/2016-03/31/2016. Is it possible to add a second date rage period for example 04/01/2016-07/31/2016 and count these as countB?
db.getCollection('customers').aggregate(
{$match: {"status": "Closed"}},
{$unwind: "$lines"},
{$match: {"lines.status": "Closed"}},
{$match: {"lines.deliveryMethod": "Tech Delivers"}},
{$match: {"date": {$gte: new Date('01/01/2016'), $lte: new Date('03/31/2016')}}},
{$group:{_id:"$lines.productLine",countA: {$sum: 1}}}
)
Thanks in advance
Sure, and you can also simplify your pipeline stages quite a lot, mostly since successive $match stages are really a single stage, and that you should always use match criteria at the beginning of any aggregation pipeline. Even if it doesn't actually "filter" the array content, it at least just selects the documents containing entries that will actually match. This speeds things up immensely, and especially with large data sets.
For the two date ranges, well this is just an $or query argument. Also it would be applied "before" the array filtering is done, since after all it is a document level match to begin with. So again, in the very first pipeline $match:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Unwind the array
{ "$unwind": "$lines" },
// Filter just the matching elements
// Successive $match is really just one pipeline stage
{ "$match": {
"lines.status": "Closed",
"lines.deliveryMethod": "Tech Delivers"
}},
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
The $or basically "joins" two result sets as it looks for "either" range criteria to apply. As this is given in addition to the other arguments, the logic is an "AND" condition as with the others on the criteria met with either $or argument. Note the $gte and $lt combination is also another form of expressing "AND" conditions on the same key.
The $elemMatch is applied since "both" criteria are required on the array element. If you just directly applied them with "dot notation", then all that really asks is that "at least one array element" matches each condition, rather than the array element matching "both" conditions.
The later filtering after $unwind can use the "dot notation" since the array elements are now "de-normalised" into separate documents. So there is only one element per document to now match the conditions.
When you apply the $group, instead of just using { "$sum": 1 } you rather "conditionally assess whether to count it or not by using $cond. Since both date ranges are within the results, you just need to determine if the current document being "rolled up" belongs to one date range or another. As a "ternary" (if/then/else) operator, this is what $cond provides.
It looks at the values within "date" in the document and if it matches the condition set ( first argument - if ) then it returns 1 ( second argument - then ), else it returns 0, effectively not adding to the current count.
Since these are "logical" conditions then the "AND" is expressed with a logical $and operator, which itself returns true or false, requiring both contained conditions to be true.
Also note the correction in the Date object constructors, since if you do not instantiate with the string in that representation then the resulting Date is in "localtime" as opposed to the "UTC" format in which MongoDB is storing the dates. Only use a "local" constructor if you really mean that, and often people really don't.
The other note is the $lt date change, which should always be "one day" greater than the last date you are looking for. Remember these are "beginning of day" dates, and therefore you usually want all possible times within the date, and not just up to the beginning. So it's "less than the next day" as the correct condition.
For the record, with MongoDB versions from 2.6, it's likely better to "pre-filter" the array content "before" you $unwind. This removes the overhead of producing new documents in the "de-normalizing" that occurs that would not match the conditions you want to apply to array elements.
For MongoDB 3.2 and greater, use $filter:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$project": {
"lines": {
"$filter": {
"input": "$lines",
"as": "line",
"cond": {
"$and": [
{ "$eq": [ "$$line.status", "Closed" ] },
{ "$eq": [ "$$line.deliveryMethod", "Tech Delivers" ] }
]
}
}
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Or for at least MongoDB 2.6, then apply $redact instead:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ "$status", "Closed" ] },
{ "$eq": [
{ "$ifNull": ["$deliveryMethod", "Tech Delivers" ] },
"Tech Delivers"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Noting that funny little $ifNull in there which is necessary due to the recursive nature of $$DESCEND, since all levels of the document are inspected, including the "top level" document and then "descending" into subsequent arrays and members or even nested objects. The "status" field is present and has a value of "Closed" due to earlier query selection criteria for the top level field, but of course there is no "top level" element called "deliveryMethod", since it is only within the array elements.
That basically is the "care" then needs to be take when using $redact like this, and if the structure if the document does not allow such conditions, then it's not really an option, so revert to processing $unwind then $match instead.
But where possible, use those methods in preference to the $unwind then $match processing, as it will save considerable time and use less resources by using the newer techniques instead.

Find all Mongo documents for which the max value in an array matches a query

Given a document structure such as this:
{
values: [
{ value: 10 },
{ value: 20 },
{ value: 30 }
]
}
I would like to search for all documents for which the maximum value in the array matches a query.
eg. If I search for all documents for which the maximum less than 25 then the example above would not match since 30 > 25.
How can I do this?
Using the .aggregate() method which provides access to the aggregation pipeline.
If you are on version 3.2 or newer the only stage in the pipeline is the $redact where you use the $$KEEP and $$PRUNE variables to return or discard documents that don't match your criteria. In the $cond expression you use the $max operator to return the maximum value in the "values" array return by the the $map operator because the $redact stage as mentioned in the documentation:
Incorporates the functionality of $project and $match
db.collection.aggregate([
{ "$redact": {
"$cond": [
{ "$lt": [
{ "$max": {
"$map": {
"input": "$values",
"as": "value",
"in": "$$value.value"
}
}},
25
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
From version 3.0 backward you need to first denormalize your "values" array using the $unwind operator then $group your documents by _id; compute the maximum using $max and use the $push accumulator operator the return the array of values. The last stage is the $redact stage.
db.collection.aggregate([
{ "$unwind": "$values" },
{ "$group": {
"_id": "$_id",
"maxvalues": { "$max": "$values.value"},
"values": { "$push": "$values" }
}},
{ "$redact": {
"$cond": [
{ "$lt": [ "$maxvalues", 25 ] },
"$$KEEP",
"$$PRUNE"
]
}}
])
This is a job for aggregates. You'll want to create a pipeline that $unwinds the Date array, which will return an array of documents, one for each Date in the array.
ex:
{ "_id" : 1, "dates.startAt": [ date1, date2, date3] }
would become
[{"_id" : 1, "dates.startAt": date1},
{"_id" : 1, "dates.startAt": date2},
{"_id" : 1, "dates.startAt": date3}]
Combined with $max and $group (see full list of aggregation expressions], you should be set.

Using $project to return an array

I have a collection with documents which look like this:
{
"campaignType" : 1,
"allowAccessControl" : true,
"userId" : "108028399"
}
I'd like to query this collection using aggregation framework and have a result which looks like this:
{
"campaignType" : ["APPLICATION"],
"allowAccessControl" : "true",
"userId" : "108028399",
}
You will notice that:
campaignType field becomes and array
the numeric value was mapped to a string
Can that be done using aggregation framework?
I tried looking at $addToSet and $push but had no luck.
Please help.
Thanks
In either case here it is th $cond operator from the aggregation framework that is your friend. It is a "ternary" operator, which means it evaluates a condition for true|false and then returns the result based on that evaluation.
So for modern versions from MongoDB 2.6 and upwards you can $project with usage of the $map operator to construct the array:
db.campaign.aggregate([
{ "$project": {
"campaignType": {
"$map": {
"input": { "$literal": [1] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
}
},
"allowAcessControl" : 1,
"userId": 1
}}
])
Or generally in most versions you can simply use the $push operator in a $group pipeline stage:
db.campaign.aggregate([
{ "$group": {
"_id": "$_id",
"campaignType": {
"$push": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
},
"allowAccessControl": { "$first": "$allowAccessControl" },
"userId": { "first": "$userId" }
}}
])
But the general concept if that you use "nested" expressions with the $cond operator in order to "test" and return some value that matches your "mapping" condition and do that with another operator that allows you to produce an array.

MongoDB - limit response in array property?

I have a MongoDB collection indicators/
It returns statistical data such as:
/indicators/population
{
id: "population"
data : [
{
country : "A",
value : 100
},
{
country : "B",
value : 150
}
]
}
I would like to be able to limit the response to specific countries.
MongoDB doesn't seem to support this, so should I:
Restructure the MongoDB collection setup to allow this via native find()
Extend my API so that it allows filtering of the data array before returning to client
Other?
This is actually a very simple operation that just involves "projection" using the positional $ operator in order to match a given condition. In the case of a "singular" match that is:
db.collection.find(
{ "data.country": "A" },
{ "data.$": 1 }
)
And that will match the first element in the array which matches the condition as given in the query.
For more than one match, you need to invoke the aggregation framework for MongoDB:
db.collection.agggregate([
// Match documents that are possible first
{ "$match": {
"data.country": "A"
}},
// Unwind the array to "de-normalize" the documents
{ "$unwind": "$data" },
// Actually filter the now "expanded" array items
{ "$match": {
"data.country": "A"
}},
// Group back together
{ "$group": {
"_id": "$_id",
"data": { "$push": "$data" }
}}
])
Or with MongoDB 2.6 or greater, a little bit cleaner, or at least without the $unwind:
db.collection.aggregate({
// Match documents that are possible first
{ "$match": {
"data.country": "A"
}},
// Filter out the array in place
{ "$project": {
"data": {
"$setDifference": [
{
"$map": {
"input": "$data",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.country", "A" },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
If my understanding of the problem is ok, then you can use :
db.population.find({"population.data.country": {$in : ["A", "C"]}});