MongoDB multidimensional array projection - mongodb

I just started learning MongoDB and can't find a solution for my problem.
Got that document:
> db.test.insert({"name" : "Anika", "arr" : [ [11, 22],[33,44] ] })
Please note the "arr" field which is a multidimensional array.
Now I'm looking for a query that returns only the value of arr[0][1] which is 22. I tried to achieve that by using $slice, however I don't know how to address the second dimension with that.
> db.test.find({},{_id:0,"arr":{$slice: [0,1]}})
{ "name" : "ha", "arr" : [ [ 11, 22 ] ] }
I also tried
> db.test.find({},{_id:0,"arr":{$slice: [0,1][1,1]}})
{ "name" : "ha", "arr" : [ [ 11, 22 ] ] }
The desired output would be either
22
or
{"arr":[[22]]}
Thank you
EDIT:
After reading the comments I think that I've simplified the example data too much and I have to provide more information:
There are many more documents in the collection like that one that
I've provided. But they all have the same structure.
There are more array elements than just two
In the real world the array contains really long texts (500kb-1mb),
so it is very expansive to transmit the whole data to the client.
Before the aggregation I will do a query by the 'name' field. Just
skipped that in the example for the sake of simplicity.
The target indexes are variable, so sometimes I need to know the
value of arr[0][1], the next time it is arr[1][4]
example data:
> db.test.insert({"name" : "Olivia", "arr" : [ [11, 22, 33, 44],[55,66,77,88],[99] ] })
> db.test.insert({"name" : "Walter", "arr" : [ [11], [22, 33, 44],[55,66,77,88],[99] ] })
> db.test.insert({"name" : "Astrid", "arr" : [ [11, 22, 33, 44],[55,66],[77,88],[99] ] })
> db.test.insert({"name" : "Peter", "arr" : [ [11, 22, 33, 44],[55,66,77,88],[99] ] })
example query:
> db.test.find({name:"Olivia"},{"arr:"...})

You can use the aggregation framework:
db.test.aggregate([
{ $unwind: '$arr' },
{ $limit: 1 },
{ $project: { _id: 0, arr: 1 } },
{ $unwind: '$arr' },
{ $skip: 1 },
{ $limit: 1 }
])
Returns:
{ "arr": 22 }
Edit: The original poster has modified my solution to suit his needs and came up with the following:
db.test.aggregate([
{ $match: { name:"Olivia" } },
{ $project: { _id: 0,arr: 1 } },
{ $unwind: '$arr' },
{ $skip: 1 },
{ $limit:1 },
{ $unwind: "$arr" },
{ $skip: 2 },
{ $limit: 1 }
])
This query will result in { arr: 77 } given the extended data provided by the OP. Note that $skip and $limit are needed to select the right elements in the array hierarchy.

The $slice form you ask for does not do multi-dimentional arrays. Each array is considered individually, and is therefore not supported that way by the current $slice.
As such it is actually done a lot shorter on indexed "first" and "last" values than has been suggested using .aggregate(), and presently:
db.test.aggregate([
{ "$unwind": "$arr" },
{ "$group": {
"_id": "$_id",
"arr": { "$first": "$arr" }
}},
{ "$unwind": "$arr" },
{ "$group": {
"_id": "$_id",
"arr": { "$last": "$arr" }
}}
])
But in future releases of MongoDB ( currently works in development branch 3.18 as of writing ) you have $arrayElemAt as an operator for the aggregation framework which works like this:
db.test.aggregate([
{ "$project": {
"arr": {
"$arrayElemAt": [
{ "$arrayElemAt": [ "$arr", 0 ] },
1
]
}
}}
])
Both basically come to the same { "arr": 22 } result, though the future available form works quite flexibly on array index values, rather than first and last.

Related

mongo - count return no docoument found instead of 0

In SQL query
select count(*) from table where id=1
would return 0 as result where there isn't any record with such id.
I would like to get exactly the same behavior but in mongo. Unfortunately I can only use aggregate function.
I was trying something like this
db.collection.aggregate([
{
"$match": {
"key": 1
}
},
{
$count: "s"
}
])
It works but only with records with key:1 but when this key does not exist there is "no document found"
You can use this aggregation query using $facet to create two possible ways: If document exists or if document does not exists.
First $facet to create the two ways
Into notFound way the result will always be {count: 0} ; into found way there is the match
Then $replaceRoot merging results to get desired value.
db.collection.aggregate([
{
"$facet": {
"notFound": [
{
"$project": {
"_id": 0,
"count": {
"$const": 0
}
}
},
{
"$limit": 1
}
],
"found": [
{
"$match": {
"key": 1
}
},
{
"$count": "count"
}
]
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"$arrayElemAt": [
"$notFound",
0
]
},
{
"$arrayElemAt": [
"$found",
0
]
}
]
}
}
}
])
Example here where key exists and here where key doesn't exists.
Also I've tested with this using $ifNull instead of $mergeObjects and seem works ok too.
I think the right way to do it is with the driver code, if you get empty results you make that document {"count" : 0} you dont need i think to do anything in the database.
Another solution can be this (replace the 5 with the key value you want)
Test code here
creates 2 groups the matched(count>0) and the not matched(count=0)
sort by {"count" : -1}
take the first, if there was a match count will be the one matched,
else it will be 0
aggregate(
[ {
"$group" : {
"_id" : {
"$cond" : [ {"$eq" : [ "$key", 5 ]}, "$key", "not_match" ]
},
"count" : {
"$sum" : {"$cond" : [ {"$eq" : [ "$key", 5 ]}, 1, 0 ]}
}
}
},
{"$sort" : {"count" : -1}},
{
"$group" : {
"_id" : null,
"count" : {"$first" : "$count"}
}
},
{"$project" : {"_id" : 0}}
])
I did it by using $facet,$project and when there were no documents to project it was showing undefined, so I used $ifNull expression. I've kept zero value for replacement expression value (see the $ifNull docs).
db.collection.aggregate([
{
"$facet": {
"keyFound": [
{
"$match": {
"key": 1
}
},
{
"$count": "count"
}
]
}
},
{
"$project": {
"keyFoundCount": {
"$ifNull": [
{
"$arrayElemAt": [
"$keyFound.count",
0
]
},
0
]
}
}
}
])
testCodeHere

Limit results in a Mongo Aggregation [duplicate]

I want to group all the documents according to a field but to restrict the number of documents grouped for each value.
Each message has a conversation_ID. I need to get 10 or lesser number of messages for each conversation_ID.
I am able to group according to the following command but can't figure out how to restrict the
number of grouped documents apart from slicing the results
Message.aggregate({'$group':{_id:'$conversation_ID',msgs:{'$push':{msgid:'$_id'}}}})
How to limit the length of msgs array for each conversation_ID to 10?
Modern
From MongoDB 3.6 there is a "novel" approach to this by using $lookup to perform a "self join" in much the same way as the original cursor processing demonstrated below.
Since in this release you can specify a "pipeline" argument to $lookup as a source for the "join", this essentially means you can use $match and $limit to gather and "limit" the entries for the array:
db.messages.aggregate([
{ "$group": { "_id": "$conversation_ID" } },
{ "$lookup": {
"from": "messages",
"let": { "conversation": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$conversation_ID", "$$conversation" ] } }},
{ "$limit": 10 },
{ "$project": { "_id": 1 } }
],
"as": "msgs"
}}
])
You can optionally add additional projection after the $lookup in order to make the array items simply the values rather than documents with an _id key, but the basic result is there by simply doing the above.
There is still the outstanding SERVER-9277 which actually requests a "limit to push" directly, but using $lookup in this way is a viable alternative in the interim.
NOTE: There also is $slice which was introduced after writing the original answer and mentioned by "outstanding JIRA issue" in the original content. Whilst you can get the same result with small result sets, it does involve still "pushing everything" into the array and then later limiting the final array output to the desired length.
So that's the main distinction and why it's generally not practical to $slice for large results. But of course can be alternately used in cases where it is.
There are a few more details on mongodb group values by multiple fields about either alternate usage.
Original
As stated earlier, this is not impossible but certainly a horrible problem.
Actually if your main concern is that your resulting arrays are going to be exceptionally large, then you best approach is to submit for each distinct "conversation_ID" as an individual query and then combine your results. In very MongoDB 2.6 syntax which might need some tweaking depending on what your language implementation actually is:
var results = [];
db.messages.aggregate([
{ "$group": {
"_id": "$conversation_ID"
}}
]).forEach(function(doc) {
db.messages.aggregate([
{ "$match": { "conversation_ID": doc._id } },
{ "$limit": 10 },
{ "$group": {
"_id": "$conversation_ID",
"msgs": { "$push": "$_id" }
}}
]).forEach(function(res) {
results.push( res );
});
});
But it all depends on whether that is what you are trying to avoid. So on to the real answer:
The first issue here is that there is no function to "limit" the number of items that are "pushed" into an array. It is certainly something we would like, but the functionality does not presently exist.
The second issue is that even when pushing all items into an array, you cannot use $slice, or any similar operator in the aggregation pipeline. So there is no present way to get just the "top 10" results from a produced array with a simple operation.
But you can actually produce a set of operations to effectively "slice" on your grouping boundaries. It is fairly involved, and for example here I will reduce the array elements "sliced" to "six" only. The main reason here is to demonstrate the process and show how to do this without being destructive with arrays that do not contain the total you want to "slice" to.
Given a sample of documents:
{ "_id" : 1, "conversation_ID" : 123 }
{ "_id" : 2, "conversation_ID" : 123 }
{ "_id" : 3, "conversation_ID" : 123 }
{ "_id" : 4, "conversation_ID" : 123 }
{ "_id" : 5, "conversation_ID" : 123 }
{ "_id" : 6, "conversation_ID" : 123 }
{ "_id" : 7, "conversation_ID" : 123 }
{ "_id" : 8, "conversation_ID" : 123 }
{ "_id" : 9, "conversation_ID" : 123 }
{ "_id" : 10, "conversation_ID" : 123 }
{ "_id" : 11, "conversation_ID" : 123 }
{ "_id" : 12, "conversation_ID" : 456 }
{ "_id" : 13, "conversation_ID" : 456 }
{ "_id" : 14, "conversation_ID" : 456 }
{ "_id" : 15, "conversation_ID" : 456 }
{ "_id" : 16, "conversation_ID" : 456 }
You can see there that when grouping by your conditions you will get one array with ten elements and another with "five". What you want to do here reduce both to the top "six" without "destroying" the array that only will match to "five" elements.
And the following query:
db.messages.aggregate([
{ "$group": {
"_id": "$conversation_ID",
"first": { "$first": "$_id" },
"msgs": { "$push": "$_id" },
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"seen": { "$eq": [ "$first", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"seen": { "$eq": [ "$second", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"third": 1,
"seen": { "$eq": [ "$third", "$msgs" ] },
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$third" },
"forth": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"third": 1,
"forth": 1,
"seen": { "$eq": [ "$forth", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$third" },
"forth": { "$first": "$forth" },
"fifth": { "$first": "$msgs" }
}},
{ "$unwind": "$msgs" },
{ "$project": {
"msgs": 1,
"first": 1,
"second": 1,
"third": 1,
"forth": 1,
"fifth": 1,
"seen": { "$eq": [ "$fifth", "$msgs" ] }
}},
{ "$sort": { "seen": 1 }},
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
},
"first": { "$first": "$first" },
"second": { "$first": "$second" },
"third": { "$first": "$third" },
"forth": { "$first": "$forth" },
"fifth": { "$first": "$fifth" },
"sixth": { "$first": "$msgs" },
}},
{ "$project": {
"first": 1,
"second": 1,
"third": 1,
"forth": 1,
"fifth": 1,
"sixth": 1,
"pos": { "$const": [ 1,2,3,4,5,6 ] }
}},
{ "$unwind": "$pos" },
{ "$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [
{ "$eq": [ "$pos", 1 ] },
"$first",
{ "$cond": [
{ "$eq": [ "$pos", 2 ] },
"$second",
{ "$cond": [
{ "$eq": [ "$pos", 3 ] },
"$third",
{ "$cond": [
{ "$eq": [ "$pos", 4 ] },
"$forth",
{ "$cond": [
{ "$eq": [ "$pos", 5 ] },
"$fifth",
{ "$cond": [
{ "$eq": [ "$pos", 6 ] },
"$sixth",
false
]}
]}
]}
]}
]}
]
}
}
}},
{ "$unwind": "$msgs" },
{ "$match": { "msgs": { "$ne": false } }},
{ "$group": {
"_id": "$_id",
"msgs": { "$push": "$msgs" }
}}
])
You get the top results in the array, up to six entries:
{ "_id" : 123, "msgs" : [ 1, 2, 3, 4, 5, 6 ] }
{ "_id" : 456, "msgs" : [ 12, 13, 14, 15 ] }
As you can see here, loads of fun.
After you have initially grouped you basically want to "pop" the $first value off of the stack for the array results. To make this process simplified a little, we actually do this in the initial operation. So the process becomes:
$unwind the array
Compare to the values already seen with an $eq equality match
$sort the results to "float" false unseen values to the top ( this still retains order )
$group back again and "pop" the $first unseen value as the next member on the stack. Also this uses the $cond operator to replace "seen" values in the array stack with false to help in the evaluation.
The final action with $cond is there to make sure that future iterations are not just adding the last value of the array over and over where the "slice" count is greater than the array members.
That whole process needs to be repeated for as many items as you wish to "slice". Since we already found the "first" item in the initial grouping, that means n-1 iterations for the desired slice result.
The final steps are really just an optional illustration of converting everything back into arrays for the result as finally shown. So really just conditionally pushing items or false back by their matching position and finally "filtering" out all the false values so the end arrays have "six" and "five" members respectively.
So there is not a standard operator to accommodate this, and you cannot just "limit" the push to 5 or 10 or whatever items in the array. But if you really have to do it, then this is your best approach.
You could possibly approach this with mapReduce and forsake the aggregation framework all together. The approach I would take ( within reasonable limits ) would be to effectively have an in-memory hash-map on the server and accumulate arrays to that, while using JavaScript slice to "limit" the results:
db.messages.mapReduce(
function () {
if ( !stash.hasOwnProperty(this.conversation_ID) ) {
stash[this.conversation_ID] = [];
}
if ( stash[this.conversation_ID.length < maxLen ) {
stash[this.conversation_ID].push( this._id );
emit( this.conversation_ID, 1 );
}
},
function(key,values) {
return 1; // really just want to keep the keys
},
{
"scope": { "stash": {}, "maxLen": 10 },
"finalize": function(key,value) {
return { "msgs": stash[key] };
},
"out": { "inline": 1 }
}
)
So that just basically builds up the "in-memory" object matching the emitted "keys" with an array never exceeding the maximum size you want to fetch from your results. Additionally this does not even bother to "emit" the item when the maximum stack is met.
The reduce part actually does nothing other than essentially just reduce to "key" and a single value. So just in case our reducer did not get called, as would be true if only 1 value existed for a key, the finalize function takes care of mapping the "stash" keys to the final output.
The effectiveness of this varies on the size of the output, and JavaScript evaluation is certainly not fast, but possibly faster than processing large arrays in a pipeline.
Vote up the JIRA issues to actually have a "slice" operator or even a "limit" on "$push" and "$addToSet", which would both be handy. Personally hoping that at least some modification can be made to the $map operator to expose the "current index" value when processing. That would effectively allow "slicing" and other operations.
Really you would want to code this up to "generate" all of the required iterations. If the answer here gets enough love and/or other time pending that I have in tuits, then I might add some code to demonstrate how to do this. It is already a reasonably long response.
Code to generate pipeline:
var key = "$conversation_ID";
var val = "$_id";
var maxLen = 10;
var stack = [];
var pipe = [];
var fproj = { "$project": { "pos": { "$const": [] } } };
for ( var x = 1; x <= maxLen; x++ ) {
fproj["$project"][""+x] = 1;
fproj["$project"]["pos"]["$const"].push( x );
var rec = {
"$cond": [ { "$eq": [ "$pos", x ] }, "$"+x ]
};
if ( stack.length == 0 ) {
rec["$cond"].push( false );
} else {
lval = stack.pop();
rec["$cond"].push( lval );
}
stack.push( rec );
if ( x == 1) {
pipe.push({ "$group": {
"_id": key,
"1": { "$first": val },
"msgs": { "$push": val }
}});
} else {
pipe.push({ "$unwind": "$msgs" });
var proj = {
"$project": {
"msgs": 1
}
};
proj["$project"]["seen"] = { "$eq": [ "$"+(x-1), "$msgs" ] };
var grp = {
"$group": {
"_id": "$_id",
"msgs": {
"$push": {
"$cond": [ { "$not": "$seen" }, "$msgs", false ]
}
}
}
};
for ( n=x; n >= 1; n-- ) {
if ( n != x )
proj["$project"][""+n] = 1;
grp["$group"][""+n] = ( n == x ) ? { "$first": "$msgs" } : { "$first": "$"+n };
}
pipe.push( proj );
pipe.push({ "$sort": { "seen": 1 } });
pipe.push(grp);
}
}
pipe.push(fproj);
pipe.push({ "$unwind": "$pos" });
pipe.push({
"$group": {
"_id": "$_id",
"msgs": { "$push": stack[0] }
}
});
pipe.push({ "$unwind": "$msgs" });
pipe.push({ "$match": { "msgs": { "$ne": false } }});
pipe.push({
"$group": {
"_id": "$_id",
"msgs": { "$push": "$msgs" }
}
});
That builds the basic iterative approach up to maxLen with the steps from $unwind to $group. Also embedded in there are details of the final projections required and the "nested" conditional statement. The last is basically the approach taken on this question:
Does MongoDB's $in clause guarantee order?
Starting Mongo 4.4, the $group stage has a new aggregation operator $accumulator allowing custom accumulations of documents as they get grouped, via javascript user defined functions.
Thus, in order to only select n messages (for instance 2) for each conversation:
// { "conversationId" : 3, "messageId" : 14 }
// { "conversationId" : 5, "messageId" : 34 }
// { "conversationId" : 3, "messageId" : 39 }
// { "conversationId" : 3, "messageId" : 47 }
db.collection.aggregate([
{ $group: {
_id: "$conversationId",
messages: {
$accumulator: {
accumulateArgs: ["$messageId"],
init: function() { return [] },
accumulate:
function(messages, message) { return messages.concat(message).slice(0, 2); },
merge:
function(messages1, messages2) { return messages1.concat(messages2).slice(0, 2); },
lang: "js"
}
}
}}
])
// { "_id" : 5, "messages" : [ 34 ] }
// { "_id" : 3, "messages" : [ 14, 39 ] }
The accumulator:
accumulates on the field messageId (accumulateArgs)
is initialised to an empty array (init)
accumulates messageId items in an array and only keeps a maximum of 2 (accumulate and merge)
Starting in Mongo 5.2, it's a perfect use case for the new $topN aggregation accumulator:
// { "conversationId" : 3, "messageId" : 14 }
// { "conversationId" : 5, "messageId" : 34 }
// { "conversationId" : 3, "messageId" : 39 }
// { "conversationId" : 3, "messageId" : 47 }
db.collection.aggregate([
{ $group: {
_id: "$conversationId",
messages: { $topN: { n: 2, output: "$messageId", sortBy: { _id: 1 } } }
}}
])
// { "_id" : 5, "messages" : [ 34 ] }
// { "_id" : 3, "messages" : [ 14, 39 ] }
This applies a $topN group accumulation that:
takes for each group the top 2 (n: 2) elements
and for each grouped record extracts the field value (output: "$messageId")
the choice of the "top 2" is defined by sortBy: { _id: 1 } (that I chose to be _id since you didn't specify an order).
The $slice operator is not an aggregation operator so you can't do this (like I suggested in this answer, before the edit):
db.messages.aggregate([
{ $group : {_id:'$conversation_ID',msgs: { $push: { msgid:'$_id' }}}},
{ $project : { _id : 1, msgs : { $slice : 10 }}}]);
Neil's answer is very detailed, but you can use a slightly different approach (if it fits your use case). You can aggregate your results and output them to a new collection:
db.messages.aggregate([
{ $group : {_id:'$conversation_ID',msgs: { $push: { msgid:'$_id' }}}},
{ $out : "msgs_agg" }
]);
The $out operator will write the results of the aggregation to a new collection. You can then use a regular find query project your results with the $slice operator:
db.msgs_agg.find({}, { msgs : { $slice : 10 }});
For this test documents:
> db.messages.find().pretty();
{ "_id" : 1, "conversation_ID" : 123 }
{ "_id" : 2, "conversation_ID" : 123 }
{ "_id" : 3, "conversation_ID" : 123 }
{ "_id" : 4, "conversation_ID" : 123 }
{ "_id" : 5, "conversation_ID" : 123 }
{ "_id" : 7, "conversation_ID" : 1234 }
{ "_id" : 8, "conversation_ID" : 1234 }
{ "_id" : 9, "conversation_ID" : 1234 }
The result will be:
> db.msgs_agg.find({}, { msgs : { $slice : 10 }});
{ "_id" : 1234, "msgs" : [ { "msgid" : 7 }, { "msgid" : 8 }, { "msgid" : 9 } ] }
{ "_id" : 123, "msgs" : [ { "msgid" : 1 }, { "msgid" : 2 }, { "msgid" : 3 },
{ "msgid" : 4 }, { "msgid" : 5 } ] }
Edit
I assume this would mean duplicating the whole messages collection.
Isn't that overkill?
Well, obviously this approach won't scale with huge collections. But, since you're considering using large aggregation pipelines or large map-reduce jobs you probably won't use this for "real-time" requests.
There are many cons of this approach: 16 MB BSON limit if you're creating huge documents with aggregation, wasting disk space / memory with duplication, increased disk IO...
The pros of this approach: its simple to implement and thus easy to change. If your collection is rarely updated you can use this "out" collection like a cache. This way you wouldn't have to perform the aggregation operation multiple times and you could then even support "real-time" client requests on the "out" collection. To refresh your data, you can periodically do aggregation (e.g. in a background job that runs nightly).
Like it was said in the comments this isn't an easy problem and there isn't a perfect solution for this (yet!). I showed you another approach you can use, it's up to you to benchmark and decide what's most appropriate for your use case.
I hope this will work as you wanted:
db.messages.aggregate([
{ $group : {_id:'$conversation_ID',msgs: { $push: { msgid:'$_id' }}}},
{ $project : { _id : 1, msgs : { $slice : ["$msgid",0,10] }}}
]);

Can I use more than 2 fields on a MongoDB aggregation framework $sort?

Using the following PyMongo Query. I used some tips from a Mongo Webinar, where they advised to use _id field to store a timestamp in order to improve performance and memory usage.
cursor = db.dados_meteo_reloaded.aggregate( [
{
"$match": {
"_id": {
"$gte": "0001:20120901",
"$lte": "0001:20140215"
},
"TMP": {
"$lt": 7.2
}
}
},
{
"$project": {
"year": {
"$substr": [
"$_id",
5,
4
]
},
"month": {
"$substr": [
"$_id",
9,
2
]
},
"day": {
"$substr": [
"$_id",
11,
2
]
}
}
},
{
"$group": {
"_id": {"year":"$year","month":"$month","day":"$day"},
"frio": {
"$sum": 0.25
}
}
},
{"$sort":{"_id.year":1, "_id.month":1, "_id.day":1}}
])
I get result that is only sorted by day. When , in the $sort step of the pipeline, I use only
{"$sort":{"_id.year":1, "_id.month":1}
The result came sorted by year and month correctly. Is there some limit on how many fields could be used on $sort step?
Here are some example documents
{
"_id" : "0001:20121201000000",
"RNF" : 0,
"WET" : 8,
"HMD" : 100,
"TMP" : 4.4
},
{
"_id" : "0001:20121201001500",
"RNF" : 0,
"WET" : 7.9,
"HMD" : 100,
"TMP" : 4.2
}
Quoting the documentation:
The $sort stage has the following prototype form:
{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }
So there are no limit on how many fields could be used on $sort stage.
However there are memory restrictions
:
The $sort stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $sort will produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files.
In Pymongo the syntax to use allowDiskUse option is:
collection.aggregate(
[
{ '$sort': { <field1>: <sort order>, <field2>: <sort order> ... } }
],
allowDiskUse = True
)
I found one possible solution here. I have already tested it.
There are no limit for sorting.Mongo Documentation
{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

mongodb $aggregate empty array and multiple documents

mongodb has below document:
> db.test.find({name:{$in:["abc","abc2"]}})
{ "_id" : 1, "name" : "abc", "scores" : [ ] }
{ "_id" : 2, "name" : "abc2", "scores" : [ 10, 20 ] }
I want get scores array length for each document, how should I do?
Tried below command:
db.test.aggregate({$match:{name:"abc2"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Result:
{ "_id" : null, "count" : 2 }
But below command:
db.test.aggregate({$match:{name:"abc"}}, {$unwind: "$scores"}, {$group: {_id:null, count:{$sum:1}}} )
Return Nothing. Question:
How should I get each lenght of scores in 2 or more document in one
command?
Why the result of second command return nothing? and how
should I check if the array is empty?
So this is actually a common problem. The result of the $unwind phase in an aggregation pipeline where the array is "empty" is to "remove" to document from the pipeline results.
In order to return a count of "0" for such an an "empty" array then you need to do something like the following.
In MongoDB 2.6 or greater, just use $size:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$group": {
"_id": null,
"count": { "$sum": { "$size": "$scores" } }
}}
])
In earlier versions you need to do this:
db.test.aggregate([
{ "$match": { "name": "abc" } },
{ "$project": {
"name": 1,
"scores": {
"$cond": [
{ "$eq": [ "$scores", [] ] },
{ "$const": [false] },
"$scores"
]
}
}},
{ "$unwind": "$scores" },
{ "$group": {
"_id": null,
"count": { "$sum": {
"$cond": [
"$scores",
1,
0
]
}}
}}
])
The modern operation is simple since $size will just "measure" the array. In the latter case you need to "replace" the array with a single false value when it is empty to avoid $unwind "destroying" this for an "empty" statement.
So replacing with false allows the $cond "trinary" to choose whether to add 1 or 0 to the $sum of the overall statement.
That is how you get the length of "empty arrays".
To get the length of scores in 2 or more documents you just need to change the _id value in the $group pipeline which contains the distinct group by key, so in this case you need to group by the document _id.
Your second aggregation returns nothing because the $match query pipeline passed a document which had an empty scores array. To check if the array is empty, your match query should be
{'scores.0': {$exists: true}} or {scores: {$not: {$size: 0}}}
Overall, your aggregation should look like this:
db.test.aggregate([
{ "$match": {"scores.0": { "$exists": true } } },
{ "$unwind": "$scores" },
{
"$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}
}
])

Mongodb $cond in aggregation framework

I have a collection with documents that look like the following:
{
ipAddr: '1.2.3.4',
"results" : [
{
"Test" : "Sight",
"Score" : "FAIL",
"Reason" : "S1002"
},
{
"Test" : "Speed",
"Score" : "FAIL",
"Reason" : "85"
},
{
"Test" : "Sound",
"Score" : "FAIL",
"Reason" : "A1001"
}
],
"finalGrade" : "FAILED"
}
Here's the aggregation query I'm trying to write, what I want to do (see commented out piece), is to create a grouped field, per ipAddr, of the
'Reason / Error' code, but only if the Reason code begins with a specific letter, and only add the code in once, I tried the following:
db.aggregate([
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push: "$finalGrade"},
// errorCodes: {$addToSet: {$cond: ["$results.Reason": /[A|B|S|N.*/, "$results.Reason", ""]}},
finalResult: {$last: "$finalGrade"} }
}
]);
Everything works, excluding the commented out 'errorCodes' line. The logic I'm attempting to create is:
"Add the the errorCodes set the value of the results.Reason code IF it begins with an A, B, S, or N, otherwise there is nothing to add".
For the Record above, the errorCodes set should contain:
...
errorCodes: [S1002,A1001],
...
$group cannot take conditional expressions, which is why that line is not working. $project is the phase where you can transform the original document based on $conditional expressions (among other things).
You need two steps in the aggregation pipeline before you can $group - first you need to $unwind the results array, and next you need to $match to filter out the results you don't care about.
That would do the simple thing of just throwing out the results with error codes you don't care about keeping, but it sounds like you want to count the total number of failures including all error codes, but then only add particular ones to the output array? There isn't a straight-forward way to do that, you would have to make two $group $unwind passes in the pipeline.
Something similar to this will do it:
db.aggregate([
{$unwind : "$results"},
{$group:
{ _id: "$ipAddr",
attempts: {$sum:1},
results: {$push : "$results"},
finalGrade: {$last : "$finalGrade" }
}
},
{$unwind: "$results"},
{$match: {"results.Reason":/yourMatchExpression/} },
{$group:
{ _id: "$ipAddr",
attempts: {$last:"$attempts"},
errorCodes: {$addToSet: "$results.Reason"},
finalResult: {$last: "$finalGrade"}
}
]);
If you only want to count attempts that have the matching error code then you can do that with a single $group - you will need to do $unwind, $match and $group. You could use $project with $cond as you had it, but then your array of errorCodes will have an empty string entry along with all the proper error codes.
As of Mongo 2.4, $regex can be used for pattern matching, but not as an expression returning a boolean, which is what's required by $cond
Then, you can either use a $match operator to use the $regex keyword:
http://mongotry.herokuapp.com/#?bookmarkId=52fb39e207fc4c02006fcfed
[
{
"$unwind": "$results"
},
{
"$match": {
"results.Reason": {
"$regex": "[SA].*"
}
}
},
{
"$group": {
"_id": "$ipAddr",
"attempts": {
"$sum": 1
},
"results": {
"$push": "$finalGrade"
},
"undefined": {
"$last": "$finalGrade"
},
"errorCodes": {
"$addToSet": "$results.Reason"
}
}
}
]
or you can use $substr as your pattern matching is very simple
http://mongotry.herokuapp.com/index.html#?bookmarkId=52fb47bc7f295802001baa38
[
{
"$unwind": "$results"
},
{
"$group": {
"_id": "$ipAddr",
"errorCodes": {
"$addToSet": {
"$cond": [
{
"$or": [
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"A"
]
},
{
"$eq": [
{
"$substr": [
"$results.Reason",
0,
1
]
},
"S"
]
}
]
},
"$results.Reason",
"null"
]
}
}
}
}
]