Related
This is my collection:
{
"Id" : "001",
"Data":[{
"updatedTime" : 1483209005,
"value" : 35
},
{
"updatedTime" : 1483209005,
"value" : 20
}
]
}
This was i tried:
db.A.aggregate([
{ "$group": {
"_id": "$Id",
"Difference": {
"$sum": {
"$cond": [
{ "$eq": [ "Data.$.value", 35.0 ] },
"$updatedTime",
{ "$cond": [
{ "$eq": [ "Data.$.value", 20.0 ] },
{ "$subtract": [ 0, "$updatedTime" ] },
0
]}
]
}
}
}}
])
But i get output like this:
{
"_id" : "001",
"Difference" : 0.0
}
I need to find difference bewteen two updatedDate fields in data array how to i do that?
You need to assign each element in your array to a variable using the $let operator in order to access the subdocument field with "dot notation" can use the $subtract and $abs. To get the first and second element, simply use the $arrayElemAt operator.
In the "in" expression you need to $subtract the two values and return the absolute value using the $abs operator.
db.collection.aggregate([
{ "$group": {
"_id": "$Id",
"Difference": {
"$sum": {
"$let": {
"vars": {
"first": { "$arrayElemAt": [ "$Data", 0 ] },
"second": { "$arrayElemAt": [ "$Data", 1 ] }
},
"in": {
"$abs": {
"$subtract": [
"$$first.updatedTime",
"$$second.updatedTime"
]
}
}
}
}
}
}}
])
As you said it will always have two elements, then do something like this --
db.getCollection('test').aggregate([{$project: {diff: {$abs: {$subtract: [{$arrayElemAt: ['$Data.value', 0]}, {$arrayElemAt: ['$Data.value', 1]}]}}}}])
What I try to do is fairly simple, I have an array inside a document ;
"tags": [
{
"t" : "architecture",
"n" : 12
},
{
"t" : "contemporary",
"n" : 2
},
{
"t" : "creative",
"n" : 1
},
{
"t" : "concrete",
"n" : 3
}
]
I want to push an array of items to array like
["architecture","blabladontexist"]
If item exists, I want to increment object's n value (in this case its architecture),
and if don't, add it as a new Item (with value of n=0) { "t": "blabladontexist", "n":0}
I have tried $addToSet, $set, $inc, $upsert: true with so many combinations and couldn't do it.
How can we do this in MongoDB?
With MongoDB 4.2 and newer, the update method can now take a document or an aggregate pipeline where the following stages can be used:
$addFields and its alias $set
$project and its alias $unset
$replaceRoot and its alias $replaceWith.
Armed with the above, your update operation with the aggregate pipeline will be to override the tags field by concatenating a filtered tags array and a mapped array of the input list with some data lookup in the map:
To start with, the aggregate expression that filters the tags array uses the $filter and it follows:
const myTags = ["architecture", "blabladontexist"];
{
"$filter": {
"input": "$tags",
"cond": {
"$not": [
{ "$in": ["$$this.t", myTags] }
]
}
}
}
which produces the filtered array of documents
[
{ "t" : "contemporary", "n" : 2 },
{ "t" : "creative", "n" : 1 },
{ "t" : "concrete", "n" : 3 }
]
Now the second part will be to derive the other array that will be concatenated to the above. This array requires a $map over the myTags input array as
{
"$map": {
"input": myTags,
"in": {
"$cond": {
"if": { "$in": ["$$this", "$tags.t"] },
"then": {
"t": "$$this",
"n": {
"$sum": [
{
"$arrayElemAt": [
"$tags.n",
{ "$indexOfArray": [ "$tags.t", "$$this" ] }
]
},
1
]
}
},
"else": { "t": "$$this", "n": 0 }
}
}
}
}
The above $map essentially loops over the input array and checks with each element whether it's in the tags array comparing the t property, if it exists then the value of the n field of the subdocument becomes its current n value
expressed with
{
"$arrayElemAt": [
"$tags.n",
{ "$indexOfArray": [ "$tags.t", "$$this" ] }
]
}
else add the default document with an n value of 0.
Overall, your update operation will be as follows
Your final update operation becomes:
const myTags = ["architecture", "blabladontexist"];
db.getCollection('coll').update(
{ "_id": "1234" },
[
{ "$set": {
"tags": {
"$concatArrays": [
{ "$filter": {
"input": "$tags",
"cond": { "$not": [ { "$in": ["$$this.t", myTags] } ] }
} },
{ "$map": {
"input": myTags,
"in": {
"$cond": [
{ "$in": ["$$this", "$tags.t"] },
{ "t": "$$this", "n": {
"$sum": [
{ "$arrayElemAt": [
"$tags.n",
{ "$indexOfArray": [ "$tags.t", "$$this" ] }
] },
1
]
} },
{ "t": "$$this", "n": 0 }
]
}
} }
]
}
} }
],
{ "upsert": true }
);
I don't believe this is possible to do in a single command.
MongoDB doesn't allow a $set (or $setOnInsert) and $inc to affect the same field in a single command.
You'll have to do one update command to attempt to $inc the field, and if that doesn't change any documents (n = 0), do the update to $set the field to it's default value.
I want to group all the documents according to a field but to restrict the number of documents grouped for each value.
Each message has a conversation_ID. I need to get 10 or lesser number of messages for each conversation_ID.
I am able to group according to the following command but can't figure out how to restrict the
number of grouped documents apart from slicing the results
Message.aggregate({'$group':{_id:'$conversation_ID',msgs:{'$push':{msgid:'$_id'}}}})
How to limit the length of msgs array for each conversation_ID to 10?
Modern
From MongoDB 3.6 there is a "novel" approach to this by using $lookup to perform a "self join" in much the same way as the original cursor processing demonstrated below.
Since in this release you can specify a "pipeline" argument to $lookup as a source for the "join", this essentially means you can use $match and $limit to gather and "limit" the entries for the array:
db.messages.aggregate([
{ "$group": { "_id": "$conversation_ID" } },
{ "$lookup": {
"from": "messages",
"let": { "conversation": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$conversation_ID", "$$conversation" ] } }},
{ "$limit": 10 },
{ "$project": { "_id": 1 } }
],
"as": "msgs"
}}
])
You can optionally add additional projection after the $lookup in order to make the array items simply the values rather than documents with an _id key, but the basic result is there by simply doing the above.
There is still the outstanding SERVER-9277 which actually requests a "limit to push" directly, but using $lookup in this way is a viable alternative in the interim.
NOTE: There also is $slice which was introduced after writing the original answer and mentioned by "outstanding JIRA issue" in the original content. Whilst you can get the same result with small result sets, it does involve still "pushing everything" into the array and then later limiting the final array output to the desired length.
So that's the main distinction and why it's generally not practical to $slice for large results. But of course can be alternately used in cases where it is.
There are a few more details on mongodb group values by multiple fields about either alternate usage.
Original
As stated earlier, this is not impossible but certainly a horrible problem.
Actually if your main concern is that your resulting arrays are going to be exceptionally large, then you best approach is to submit for each distinct "conversation_ID" as an individual query and then combine your results. In very MongoDB 2.6 syntax which might need some tweaking depending on what your language implementation actually is:
var results = [];
db.messages.aggregate([
{ "$group": {
"_id": "$conversation_ID"
}}
]).forEach(function(doc) {
db.messages.aggregate([
{ "$match": { "conversation_ID": doc._id } },
{ "$limit": 10 },
{ "$group": {
"_id": "$conversation_ID",
"msgs": { "$push": "$_id" }
}}
]).forEach(function(res) {
results.push( res );
});
});
But it all depends on whether that is what you are trying to avoid. So on to the real answer:
The first issue here is that there is no function to "limit" the number of items that are "pushed" into an array. It is certainly something we would like, but the functionality does not presently exist.
The second issue is that even when pushing all items into an array, you cannot use $slice, or any similar operator in the aggregation pipeline. So there is no present way to get just the "top 10" results from a produced array with a simple operation.
But you can actually produce a set of operations to effectively "slice" on your grouping boundaries. It is fairly involved, and for example here I will reduce the array elements "sliced" to "six" only. The main reason here is to demonstrate the process and show how to do this without being destructive with arrays that do not contain the total you want to "slice" to.
Given a sample of documents:
{ "_id" : 1, "conversation_ID" : 123 }
{ "_id" : 2, "conversation_ID" : 123 }
{ "_id" : 3, "conversation_ID" : 123 }
{ "_id" : 4, "conversation_ID" : 123 }
{ "_id" : 5, "conversation_ID" : 123 }
{ "_id" : 6, "conversation_ID" : 123 }
{ "_id" : 7, "conversation_ID" : 123 }
{ "_id" : 8, "conversation_ID" : 123 }
{ "_id" : 9, "conversation_ID" : 123 }
{ "_id" : 10, "conversation_ID" : 123 }
{ "_id" : 11, "conversation_ID" : 123 }
{ "_id" : 12, "conversation_ID" : 456 }
{ "_id" : 13, "conversation_ID" : 456 }
{ "_id" : 14, "conversation_ID" : 456 }
{ "_id" : 15, "conversation_ID" : 456 }
{ "_id" : 16, "conversation_ID" : 456 }
You can see there that when grouping by your conditions you will get one array with ten elements and another with "five". What you want to do here reduce both to the top "six" without "destroying" the array that only will match to "five" elements.
And the following query:
db.messages.aggregate([
{ "$group": {
"_id": "$conversation_ID",
"first": { "$first": "$_id" },
"msgs": { "$push": "$_id" },
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"seen": { "$eq": [ "$first", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"seen": { "$eq": [ "$second", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"third": 1,
"seen": { "$eq": [ "$third", "$msgs" ] },
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$third" },
"forth": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"third": 1,
"forth": 1,
"seen": { "$eq": [ "$forth", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$third" },
"forth": { "$first": "$forth" },
"fifth": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"third": 1,
"forth": 1,
"fifth": 1,
"seen": { "$eq": [ "$fifth", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$third" },
"forth": { "$first": "$forth" },
"fifth": { "$first": "$fifth" },
"sixth": { "$first": "$msgs" },
}},
{ "$project": {
"first": 1,
"second": 1,
"third": 1,
"forth": 1,
"fifth": 1,
"sixth": 1,
"pos": { "$const": [ 1,2,3,4,5,6 ] }
}},
{ "$unwind": "$pos" },
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [
{ "$eq": [ "$pos", 1 ] },
"$first",
{ "$cond": [
{ "$eq": [ "$pos", 2 ] },
"$second",
{ "$cond": [
{ "$eq": [ "$pos", 3 ] },
"$third",
{ "$cond": [
{ "$eq": [ "$pos", 4 ] },
"$forth",
{ "$cond": [
{ "$eq": [ "$pos", 5 ] },
"$fifth",
{ "$cond": [
{ "$eq": [ "$pos", 6 ] },
"$sixth",
false
]}
]}
]}
]}
]}
]
}
}
}},
{ "$unwind": "$msgs" },
{ "$match": { "msgs": { "$ne": false } }},
{ "$group": {
"_id": "$_id",
"msgs": { "$push": "$msgs" }
}}
])
You get the top results in the array, up to six entries:
{ "_id" : 123, "msgs" : [ 1, 2, 3, 4, 5, 6 ] }
{ "_id" : 456, "msgs" : [ 12, 13, 14, 15 ] }
As you can see here, loads of fun.
After you have initially grouped you basically want to "pop" the $first value off of the stack for the array results. To make this process simplified a little, we actually do this in the initial operation. So the process becomes:
$unwind the array
Compare to the values already seen with an $eq equality match
$sort the results to "float" false unseen values to the top ( this still retains order )
$group back again and "pop" the $first unseen value as the next member on the stack. Also this uses the $cond operator to replace "seen" values in the array stack with false to help in the evaluation.
The final action with $cond is there to make sure that future iterations are not just adding the last value of the array over and over where the "slice" count is greater than the array members.
That whole process needs to be repeated for as many items as you wish to "slice". Since we already found the "first" item in the initial grouping, that means n-1 iterations for the desired slice result.
The final steps are really just an optional illustration of converting everything back into arrays for the result as finally shown. So really just conditionally pushing items or false back by their matching position and finally "filtering" out all the false values so the end arrays have "six" and "five" members respectively.
So there is not a standard operator to accommodate this, and you cannot just "limit" the push to 5 or 10 or whatever items in the array. But if you really have to do it, then this is your best approach.
You could possibly approach this with mapReduce and forsake the aggregation framework all together. The approach I would take ( within reasonable limits ) would be to effectively have an in-memory hash-map on the server and accumulate arrays to that, while using JavaScript slice to "limit" the results:
db.messages.mapReduce(
function () {
if ( !stash.hasOwnProperty(this.conversation_ID) ) {
stash[this.conversation_ID] = [];
}
if ( stash[this.conversation_ID.length < maxLen ) {
stash[this.conversation_ID].push( this._id );
emit( this.conversation_ID, 1 );
}
},
function(key,values) {
return 1; // really just want to keep the keys
},
{
"scope": { "stash": {}, "maxLen": 10 },
"finalize": function(key,value) {
return { "msgs": stash[key] };
},
"out": { "inline": 1 }
}
)
So that just basically builds up the "in-memory" object matching the emitted "keys" with an array never exceeding the maximum size you want to fetch from your results. Additionally this does not even bother to "emit" the item when the maximum stack is met.
The reduce part actually does nothing other than essentially just reduce to "key" and a single value. So just in case our reducer did not get called, as would be true if only 1 value existed for a key, the finalize function takes care of mapping the "stash" keys to the final output.
The effectiveness of this varies on the size of the output, and JavaScript evaluation is certainly not fast, but possibly faster than processing large arrays in a pipeline.
Vote up the JIRA issues to actually have a "slice" operator or even a "limit" on "$push" and "$addToSet", which would both be handy. Personally hoping that at least some modification can be made to the $map operator to expose the "current index" value when processing. That would effectively allow "slicing" and other operations.
Really you would want to code this up to "generate" all of the required iterations. If the answer here gets enough love and/or other time pending that I have in tuits, then I might add some code to demonstrate how to do this. It is already a reasonably long response.
Code to generate pipeline:
var key = "$conversation_ID";
var val = "$_id";
var maxLen = 10;
var stack = [];
var pipe = [];
var fproj = { "$project": { "pos": { "$const": [] } } };
for ( var x = 1; x <= maxLen; x++ ) {
fproj["$project"][""+x] = 1;
fproj["$project"]["pos"]["$const"].push( x );
var rec = {
"$cond": [ { "$eq": [ "$pos", x ] }, "$"+x ]
};
if ( stack.length == 0 ) {
rec["$cond"].push( false );
} else {
lval = stack.pop();
rec["$cond"].push( lval );
}
stack.push( rec );
if ( x == 1) {
pipe.push({ "$group": {
"_id": key,
"1": { "$first": val },
"msgs": { "$push": val }
}});
} else {
pipe.push({ "$unwind": "$msgs" });
var proj = {
"$project": {
"msgs": 1
}
};
proj["$project"]["seen"] = { "$eq": [ "$"+(x-1), "$msgs" ] };
var grp = {
"$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
}
}
};
for ( n=x; n >= 1; n-- ) {
if ( n != x )
proj["$project"][""+n] = 1;
grp["$group"][""+n] = ( n == x ) ? { "$first": "$msgs" } : { "$first": "$"+n };
}
pipe.push( proj );
pipe.push({ "$sort": { "seen": 1 } });
pipe.push(grp);
}
}
pipe.push(fproj);
pipe.push({ "$unwind": "$pos" });
pipe.push({
"$group": {
"_id": "$_id",
"msgs": { "$push": stack[0] }
}
});
pipe.push({ "$unwind": "$msgs" });
pipe.push({ "$match": { "msgs": { "$ne": false } }});
pipe.push({
"$group": {
"_id": "$_id",
"msgs": { "$push": "$msgs" }
}
});
That builds the basic iterative approach up to maxLen with the steps from $unwind to $group. Also embedded in there are details of the final projections required and the "nested" conditional statement. The last is basically the approach taken on this question:
Does MongoDB's $in clause guarantee order?
Starting Mongo 4.4, the $group stage has a new aggregation operator $accumulator allowing custom accumulations of documents as they get grouped, via javascript user defined functions.
Thus, in order to only select n messages (for instance 2) for each conversation:
// { "conversationId" : 3, "messageId" : 14 }
// { "conversationId" : 5, "messageId" : 34 }
// { "conversationId" : 3, "messageId" : 39 }
// { "conversationId" : 3, "messageId" : 47 }
db.collection.aggregate([
{ $group: {
_id: "$conversationId",
messages: {
$accumulator: {
accumulateArgs: ["$messageId"],
init: function() { return [] },
accumulate:
function(messages, message) { return messages.concat(message).slice(0, 2); },
merge:
function(messages1, messages2) { return messages1.concat(messages2).slice(0, 2); },
lang: "js"
}
}
}}
])
// { "_id" : 5, "messages" : [ 34 ] }
// { "_id" : 3, "messages" : [ 14, 39 ] }
The accumulator:
accumulates on the field messageId (accumulateArgs)
is initialised to an empty array (init)
accumulates messageId items in an array and only keeps a maximum of 2 (accumulate and merge)
Starting in Mongo 5.2, it's a perfect use case for the new $topN aggregation accumulator:
// { "conversationId" : 3, "messageId" : 14 }
// { "conversationId" : 5, "messageId" : 34 }
// { "conversationId" : 3, "messageId" : 39 }
// { "conversationId" : 3, "messageId" : 47 }
db.collection.aggregate([
{ $group: {
_id: "$conversationId",
messages: { $topN: { n: 2, output: "$messageId", sortBy: { _id: 1 } } }
}}
])
// { "_id" : 5, "messages" : [ 34 ] }
// { "_id" : 3, "messages" : [ 14, 39 ] }
This applies a $topN group accumulation that:
takes for each group the top 2 (n: 2) elements
and for each grouped record extracts the field value (output: "$messageId")
the choice of the "top 2" is defined by sortBy: { _id: 1 } (that I chose to be _id since you didn't specify an order).
The $slice operator is not an aggregation operator so you can't do this (like I suggested in this answer, before the edit):
db.messages.aggregate([
{ $group : {_id:'$conversation_ID',msgs: { $push: { msgid:'$_id' }}}},
{ $project : { _id : 1, msgs : { $slice : 10 }}}]);
Neil's answer is very detailed, but you can use a slightly different approach (if it fits your use case). You can aggregate your results and output them to a new collection:
db.messages.aggregate([
{ $group : {_id:'$conversation_ID',msgs: { $push: { msgid:'$_id' }}}},
{ $out : "msgs_agg" }
]);
The $out operator will write the results of the aggregation to a new collection. You can then use a regular find query project your results with the $slice operator:
db.msgs_agg.find({}, { msgs : { $slice : 10 }});
For this test documents:
> db.messages.find().pretty();
{ "_id" : 1, "conversation_ID" : 123 }
{ "_id" : 2, "conversation_ID" : 123 }
{ "_id" : 3, "conversation_ID" : 123 }
{ "_id" : 4, "conversation_ID" : 123 }
{ "_id" : 5, "conversation_ID" : 123 }
{ "_id" : 7, "conversation_ID" : 1234 }
{ "_id" : 8, "conversation_ID" : 1234 }
{ "_id" : 9, "conversation_ID" : 1234 }
The result will be:
> db.msgs_agg.find({}, { msgs : { $slice : 10 }});
{ "_id" : 1234, "msgs" : [ { "msgid" : 7 }, { "msgid" : 8 }, { "msgid" : 9 } ] }
{ "_id" : 123, "msgs" : [ { "msgid" : 1 }, { "msgid" : 2 }, { "msgid" : 3 },
{ "msgid" : 4 }, { "msgid" : 5 } ] }
Edit
I assume this would mean duplicating the whole messages collection.
Isn't that overkill?
Well, obviously this approach won't scale with huge collections. But, since you're considering using large aggregation pipelines or large map-reduce jobs you probably won't use this for "real-time" requests.
There are many cons of this approach: 16 MB BSON limit if you're creating huge documents with aggregation, wasting disk space / memory with duplication, increased disk IO...
The pros of this approach: its simple to implement and thus easy to change. If your collection is rarely updated you can use this "out" collection like a cache. This way you wouldn't have to perform the aggregation operation multiple times and you could then even support "real-time" client requests on the "out" collection. To refresh your data, you can periodically do aggregation (e.g. in a background job that runs nightly).
Like it was said in the comments this isn't an easy problem and there isn't a perfect solution for this (yet!). I showed you another approach you can use, it's up to you to benchmark and decide what's most appropriate for your use case.
I hope this will work as you wanted:
db.messages.aggregate([
{ $group : {_id:'$conversation_ID',msgs: { $push: { msgid:'$_id' }}}},
{ $project : { _id : 1, msgs : { $slice : ["$msgid",0,10] }}}
]);
I have collection in mongodb (3.0):
{
_id: 1,
m: [{_id:11, _t: 'type1'},
{_id:12, _t: 'type2'},
{_id:13, _t: 'type3'}]
},
{
_id: 2,
m: [{_id:21, _t: 'type1'},
{_id:22, _t: 'type21'},
{_id:23, _t: 'type3'}]
}
I want to find documents with m attributes where m._t containing ['type1', 'type2'].
Like this:
{
_id: 1,
m: [{_id:11, _t: 'type1'},
{_id:12, _t: 'type2'}]
},
{
_id: 2,
m: [{_id:21, _t: 'type1'}]
}
I tried to use $ and $elemMatch, but couldn't get required result.
How to do it, using find()?
Help me, please! Thanks!
Because the $elemMatch operator limits the contents of the m array field from the query results to contain only the first element matching the $elemMatch condition, the following will only return the an array with the first matching elements
{
"_id" : 11,
"_t" : "type1"
}
and
{
"_id" : 21,
"_t" : "type1"
}
Query using $elemMatch projection:
db.collection.find(
{
"m._t": {
"$in": ["type1", "type2"]
}
},
{
"m": {
"$elemMatch": {
"_t": {
"$in": ["type1", "type2"]
}
}
}
}
)
Result:
/* 0 */
{
"_id" : 1,
"m" : [
{
"_id" : 11,
"_t" : "type1"
}
]
}
/* 1 */
{
"_id" : 2,
"m" : [
{
"_id" : 21,
"_t" : "type1"
}
]
}
One approach you can take is the aggregation framework, where your pipeline would consist of a $match operator, similar to the find query above to filter the initial stream of documents. The next pipeline step would be the crucial $unwind operator that "splits" the array elements to be further streamlined with another $match operator and then the final $group pipeline to restore the original data structure by using the accumulator operator $push.
The following illustrates this path:
db.collection.aggregate([
{
"$match": {
"m._t": {
"$in": ["type1", "type2"]
}
}
},
{
"$unwind": "$m"
},
{
"$match": {
"m._t": {
"$in": ["type1", "type2"]
}
}
},
{
"$group": {
"_id": "$_id",
"m": {
"$push": "$m"
}
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : 2,
"m" : [
{
"_id" : 21,
"_t" : "type1"
}
]
},
{
"_id" : 1,
"m" : [
{
"_id" : 11,
"_t" : "type1"
},
{
"_id" : 12,
"_t" : "type2"
}
]
}
],
"ok" : 1
}
To get your "filtered" result, the $redact with the aggregation pipeline is the fastest way:
db.junk.aggregate([
{ "$match": { "m._t": { "$in": ["type1", "type2"] } } },
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$eq": [ { "$ifNull": ["$_t", "type1"] }, "type1" ] },
{ "$eq": [ { "$ifNull": ["$_t", "type2"] }, "type2" ] }
],
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
The $redact operator sets up a logical filter for the document that can also traverse into the array levels. Note that this is matching on _t at all levels of the document, so make sure there are no other elements sharing this name.
The query uses $in for selection just as the logical filter uses $or. Anything that does not match, gets "pruned".
{
"_id" : 1,
"m" : [
{
"_id" : 11,
"_t" : "type1"
},
{
"_id" : 12,
"_t" : "type2"
}
]
}
{
"_id" : 2,
"m" : [ { "_id" : 21, "_t" : "type1" } ]
}
Short and sweet and simple.
A bit more cumbersome, but a reasonably safer is to use this construct with $map and $setDifference to filter results:
db.junk.aggregate([
{ "$match": { "m._t": { "$in": ["type1", "type2"] } } },
{ "$project": {
"m": {
"$setDifference": [
{ "$map": {
"input": "$m",
"as": "el",
"in": {
"$cond": {
"if": {
"$or": [
{ "$eq": [ "$$el._t", "type1" ] },
{ "$eq": [ "$$el._t", "type2" ] }
]
},
"then": "$$el",
"else": false
}
}
}},
[false]
]
}
}}
])
The $map evaluates the conditions against each element and the $setDifference removes any of those condtions that returned false rather than the array content. Very similar to the $cond in redact above, but it is just working specifically with the one array and not the whole document.
In future MongoDB releases ( currently available in development releases ) there will be the $filter operator, which is very simple to follow:
db.junk.aggregate([
{ "$match": { "m._t": { "$in": ["type1", "type2"] } } },
{ "$project": {
"m": {
"$filter": {
"input": "$m",
"as": "el",
"cond": {
"$or": [
{ "$eq": [ "$$el._t", "type1" ] },
{ "$eq": [ "$$el._t", "type2" ] }
]
}
}
}
}}
])
And that will simply remove any array element that does not match the specified conditions.
If you want to filter array content on the server, the aggregation framework is the way to do it.
I have collection of products. Each product contains array of items.
> db.products.find().pretty()
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "01.02.2100",
"purchasePrice" : 1,
"sellingPrice" : 10,
"count" : 15
},
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
So, can you please give me an advice, how I can query MongoDB to retrieve all products with only single item which date is equals to the date I pass to query as parameter.
The result for "31.08.2014" must be:
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
What you are looking for is the positional $ operator and "projection". For a single field you need to match the required array element using "dot notation", for more than one field use $elemMatch:
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
)
Or the $elemMatch for more than one matching field:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "shop": 1, "name":1, "items.$": 1 }
)
These work for a single array element only though and only one will be returned. If you want more than one array element to be returned from your conditions then you need more advanced handling with the aggregation framework.
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$unwind": "$items" },
{ "$match": { "items.date": "31.08.2014" } },
{ "$group": {
"_id": "$_id",
"shop": { "$first": "$shop" },
"name": { "$first": "$name" },
"items": { "$push": "$items" }
}}
])
Or possibly in shorter/faster form since MongoDB 2.6 where your array of items contains unique entries:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$project": {
"shop": 1,
"name": 1,
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.date", "31.08.2014" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
Or possibly with $redact, but a little contrived:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$redact": {
"$cond": [
{ "$eq": [ { "$ifNull": [ "$date", "31.08.2014" ] }, "31.08.2014" ] },
"$$DESCEND",
"$$PRUNE"
]
}}
])
More modern, you would use $filter:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$addFields": {
"items": {
"input": "$items",
"cond": { "$eq": [ "$$this.date", "31.08.2014" ] }
}
}}
])
And with multiple conditions, the $elemMatch and $and within the $filter:
db.products.aggregate([
{ "$match": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "$addFields": {
"items": {
"input": "$items",
"cond": {
"$and": [
{ "$eq": [ "$$this.date", "31.08.2014" ] },
{ "$eq": [ "$$this.purchasePrice", 1 ] }
]
}
}
}}
])
So it just depends on whether you always expect a single element to match or multiple elements, and then which approach is better. But where possible the .find() method will generally be faster since it lacks the overhead of the other operations, which in those last to forms does not lag that far behind at all.
As a side note, your "dates" are represented as strings which is not a very good idea going forward. Consider changing these to proper Date object types, which will greatly help you in the future.
Based on Neil Lunn's code I work with this solution, it includes automatically all first level keys (but you could also exclude keys if you want):
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
{ items: { $elemMatch: { date: "31.08.2014" } } },
)
With multiple requirements:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ items: { $elemMatch: { "date": "31.08.2014", "purchasePrice": 1 } } },
)
Mongo supports dot notation for sub-queries.
See: http://docs.mongodb.org/manual/reference/glossary/#term-dot-notation
Depending on your driver, you want something like:
db.products.find({"items.date":"31.08.2014"});
Note that the attribute is in quotes for dot notation, even if usually your driver doesn't require this.