Database sort on numeric values that are actually strings - mongodb

I have below query. In that field_a is String property and field_b is an array of type Number. I want an array having property field_a and field_b with unique combination. Here field_a contains numeric value but in string format. So I want to apply natural sort in aggregation pipeline. $natural can be used only with such query db.collection.find().sort( { $natural: 1 } )
So how can I use natural sort in MongoDB or I need to depend upon JS functions or on lodash/underscore.js ?
db.collection.aggregate([
{"$group": { "_id": { field_a: "$field_a", field_b: "$field_b" } } },
{ $project: { a: "$_id" } },
{"$group": {"_id": 'a', "res": {"$addToSet": "$_id" }}},
{"$unwind": "$res"},
{"$sort": { "res": 1}},
{"$group": { "_id": null, "res": {"$push": "$res" }}}
])

I short, this is what you want to do here:
db.collection.aggregate([
{ "$group": {
"_id": {
"field_a": {
"$concat": [
{ "$substrCP": [
"0000000000",
0,
{ "$subtract": [ 10, { "$strLenCP": "$field_a" } ] }
]},
"$field_a"
]
},
"field_b": "$field_b"
}
}},
{ "$sort": { "_id": 1 } }
])
Explanation
As a basic concept, the problem you have is that "strings" sort in in a way that does not translate to how numeric sorts work.
As a brief example, these documents use a string value:
{ "_id" : ObjectId("5928276f84c3559bc2fd458b"), "a" : "5" }
{ "_id" : ObjectId("5928277484c3559bc2fd458c"), "a" : "50" }
{ "_id" : ObjectId("5928277e84c3559bc2fd458d"), "a" : "60" }
{ "_id" : ObjectId("5928278284c3559bc2fd458e"), "a" : "6" }
If you try to sort these, then the lexical order applies:
> db.list.find().sort({ "a": 1 })
{ "_id" : ObjectId("5928276f84c3559bc2fd458b"), "a" : "5" }
{ "_id" : ObjectId("5928277484c3559bc2fd458c"), "a" : "50" }
{ "_id" : ObjectId("5928278284c3559bc2fd458e"), "a" : "6" }
{ "_id" : ObjectId("5928277e84c3559bc2fd458d"), "a" : "60" }
As strings, this makes sense. Since a "50" begins with "5" and is therefore less than the "6".
Since the aggregation framework cannot "cast" these as numeric values, then your only option is to present the "strings" in a way in which they will order lexically in the same way as they would with numeric values.
In brief terms we "zero pad" them, which is essentially making the values fixed length strings which are prefixed or "padded" with 0 which makes the "strings" appear to be ordered like they would be numerically:
db.list.aggregate([
{ "$project": {
"a": {
"$concat": [
{ "$substrCP": [
"0000000000",
0,
{ "$subtract": [ 10, { "$strLenCP": "$a" } ] }
]},
"$a"
]
}
}},
{ "$sort": { "a": 1 } }
])
And this will produce a list in order like:
{ "_id" : ObjectId("5928276f84c3559bc2fd458b"), "a" : "0000000005" }
{ "_id" : ObjectId("5928278284c3559bc2fd458e"), "a" : "0000000006" }
{ "_id" : ObjectId("5928277484c3559bc2fd458c"), "a" : "0000000050" }
{ "_id" : ObjectId("5928277e84c3559bc2fd458d"), "a" : "0000000060" }
The basic premise here is that you take a "template" string, which is in this example a string of 0 which is 10 characters long. Then we look at the length of the field data to transform using $strLenCP and $subtract that length from 10 which is the length of the template string used here.
The difference in length is fed to the $substrCP operator as the number of characters to take from the template. This output is then fed to $concat in order to make a "string" which like the template is 10 characters long, but starts with zeros and ends in the initial numeric string.
In your actual usage, this will end up as part of a composite key. Yet since the transformed key is first in the key order, then simply sorting by _id considers this and primarily sorts by values in the first key and then the second.

Related

How to find percentage of grouping containing a specific word

I am trying to calculate the percentage of listings in a MongoDB that contain a specific word grouped by a collection's object.
I have managed to group the count of listings containing the word but not the percentage on the total count of each group's listings.
My collection looks like this:
{
"_id" : "103456",
"metadata" : {
"type" : "Bike",
"brand" : "Siamoto",
"model" : "Siamoto vespa '01 - € 550 EUR (Negotiable)"
}
},
{
"_id" : "103457",
"metadata" : {
"type" : "Bike",
"brand" : "BMW",
"model" : "BMW ADFR '06 - € 5680 EUR"
}
}
I want to project the percentage of ads per metadata.brand that contain the word "Negotiable" in metadata.model.
I have used for the count something like:
db.advertisements.aggregate([
{ $match: { $text: { $search: "Negotiable" } } },
{ $group: { _id: "$metadata.brand", Count: { $sum: 1} } }
])
and it worked but I can't find a workaround for the percentage. Thanks to all
For what you are trying to do, using a $text search or even a $regex is the wrong approach. All these can do is return the "matching" documents only from within the collection.
Using Aggregate to Count String Matches
Whist not as flexible as a regular expression ( and sadly there is no aggregation operator equivalent at this time, but there will be in future releases. See SERVER-11947 ) the better option is to use $indexOfCP in order to match the occurrence of the "string" and then count those against the "total counts" from each grouping:
db.advertisements.aggregate([
{ "$group": {
"_id": "$metadata.brand",
"totalCount": { "$sum": 1 },
"matchedCount": {
"$sum": {
"$cond": [{ "$ne": [{ "$indexOfCP": [ "$metadata.model", "Negotiable" ] }, -1 ] }, 1, 0]
}
}
}},
{ "$addFields": {
"percentage": {
"$cond": {
"if": { "$ne": [ "$matchedCount", 0 ] },
"then": {
"$multiply": [
{ "$divide": [ "$matchedCount", "$totalCount" ] },
100
]
},
"else": 0
}
}
}},
{ "$sort": { "percentage": -1 } }
])
And the results:
{ "_id" : "Siamoto", "totalCount" : 1, "matchedCount" : 1, "percentage" : 100 }
{ "_id" : "BMW", "totalCount" : 1, "matchedCount" : 0, "percentage" : 0 }
Note that the $group is used for the accumulation of both the total documents found within the "brand" as well as those where the string was matched. The $cond operator used here is a "ternary" or if/then/else statement which evaluates a boolean expression and then returns either one value where true or another where false. In this case the $indexOfCP NOT returning the -1 value or "not found".
The "percentage" is actually done in a separate stage, which in this case we use $addFields to add the "additional field". The operation is basically a $divide over the two accumulated values from the previous stage. The $cond is just applied to avoid "divide by 0" errors and the $multiply is just moving the decimal places into something that looks more like a "percentage". But the basic premise is such calculations which require "totals" to be accumulated first will always be a manipulation in a "later stage".
MongoDB 4.2 (proposed) Preview
FYI, on the current "unfinalized" syntax for $regexFind from MongoDB 4.2 (proposed, and yet to be finalized if included in that release ) and onwards this would be something like:
db.advertisements.aggregate([
{ "$group": {
"_id": "$metadata.brand",
"totalCount": { "$sum": 1 },
"matchedCount": {
"$sum": {
"$cond": {
"if": {
"$ne": [
{ "$regexFind": {
"input": "$metadata.model",
"regex": /Negotiable/i
}},
null
]
},
"then": 1,
"else": 0
}
}
}
}},
{ "$addFields": {
"percentage": {
"$cond": {
"if": { "$ne": [ "$matchedCount", 0 ] },
"then": {
"$multiply": [
{ "$divide": [ "$matchedCount", "$totalCount" ] },
100
]
},
"else": 0
}
}
}},
{ "$sort": { "percentage": -1 } }
])
Again noting strongly that the "current" implementation may be subject to change by the time it is released. This is how it works on the current 4.1.9-17-g0a856820ba development release.
Using MapReduce
An alternate approach where either your MongoDB version does not support $indexOfCP OR you need more flexibility in how you "match the string" is to use mapReduce for the aggregation instead:
db.advertisements.mapReduce(
function() {
emit(this.metadata.brand, {
totalCount: 1,
matchedCount: (/Negotiable/i.test(this.metadata.model)) ? 1 : 0
});
},
function(key,values) {
var obj = { totalCount: 0, matchedCount: 0 };
values.forEach(value => {
obj.totalCount += value.totalCount;
obj.matchedCount += value.matchedCount;
});
return obj;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
value.percentage = (value.matchedCount != 0)
? (value.matchedCount / value.totalCount) * 100
: 0;
return value;
}
}
)
This has a similar result, but in a very "mapReduce" specific way:
{
"_id" : "BMW",
"value" : {
"totalCount" : 1,
"matchedCount" : 0,
"percentage" : 0
}
},
{
"_id" : "Siamoto",
"value" : {
"totalCount" : 1,
"matchedCount" : 1,
"percentage" : 100
}
}
The logic is pretty much the same. We "emit" using the "key" for the "brand" and then use another ternary to determine whether to count a "match" or not. In this case a regular expression test() operation, and even using "case insensitive" matching as an example.
The "reducer" part simply accumulates the values that were emitted, and the finalize function is where the "percentage" is returned by the same division and multiplication process.
The main difference between the two other than basic capabilities is that the mapReduce cannot do "further things" beyond the accumulation and basic manipulation in the finalize. The "sorting" demonstrated in the aggregation pipeline cannot be done with mapReduce without outputting to a separate collection and doing a separate find() and sort() on those documents contained.
Either way works, and it just depends on your needs and the capabilities of what you have available. Of course any aggregate() approach will be much faster than using the JavaScript evaluation of mapReduce. So you probably want aggregate() as your preference where possible.

MongoDB aggregation, find number of distinct values in documents' arrays

Reading the docs, I see you can get the number of elements in document arrays. For example given the following documents:
{ "_id" : 1, "item" : "ABC1", "description" : "product 1", colors: [ "blue", "black", "red" ] }
{ "_id" : 2, "item" : "ABC2", "description" : "product 2", colors: [ "purple" ] }
{ "_id" : 3, "item" : "XYZ1", "description" : "product 3", colors: [ ] }
and the following query:
db.inventory.aggregate([{$project: {item: 1, numberOfColors: { $size: "$colors" }}}])
We would get the number of elements in each document's colors array:
{ "_id" : 1, "item" : "ABC1", "numberOfColors" : 3 }
{ "_id" : 2, "item" : "ABC2", "numberOfColors" : 1 }
{ "_id" : 3, "item" : "XYZ1", "numberOfColors" : 0 }
I've not been able to figure out if and how you could sum up all the colors in all the documents directly from a query, ie:
{ "totalColors": 4 }
You can use the following query to get the count of all colors in all docs:
db.inventory.aggregate([
{ $unwind: '$colors' } , // expands nested array so we have one doc per each array value
{ $group: {_id: null, allColors: {$addToSet: "$colors"} } } , // find all colors
{ $project: { totalColors: {$size: "$allColors"}}} // find count of all colors
])
Infinitely better is is to simply $sum the $size:
db.inventory.aggregate([
{ "$group": { "_id": null, "totalColors": { "$sum": { "$size": "$colors" } } }
])
If you wanted "distinct in each document" then you would instead:
db.inventory.aggregate([
{ "$group": {
"_id": null,
"totalColors": {
"$sum": {
"$size": { "$setUnion": [ [], "$colors" ] }
}
}
}}
])
Where $setUnion takes values likes ["purple","blue","purple"] and makes it into ["purple","blue"] as a "set" with "distinct items".
And if you really want "distinct across documents" then don't accumulate the "distinct" into a single document. That causes performance issues and simply does not scale to large data sets, and can possibly break the 16MB BSON Limit. Instead accumulate naturally via the key:
db.inventory.aggregate([
{ "$unwind": "$colors" },
{ "$group": { "_id": "$colors" } },
{ "$group": { "_id": null, "totalColors": { "$sum": 1 } } }
])
Where you only use $unwind because you want "distinct" values from the array as combined with other documents. Generally $unwind should be avoided unless the value contained in the array is being accessed in the "grouping key" _id of $group. Where it is not, it is better to treat arrays using other operators instead, since $unwind creates a "copy" of the whole document per array element.
And of course there was also nothing wrong with simply using .distinct() here, which will return the "distinct" values "as an array", for which you can just test the Array.length() on in code:
var totalSize = db.inventory.distinct("colors").length;
Which for the simple operation you are asking, would be the overall fastest approach for a simple "count of distinct elements". Of course the limitation remains that the result cannot exceed the 16MB BSON limit as a payload. Which is where you defer to .aggregate() instead.

How to project additional data from an aggregate result with MongoDB?

I'm learning MongoDB and try to group a collection.
What I'm looking for is to group by year, get the max "average note" field and display the field primary name of the document related to this average
For example, if I have:
Name | Average | Year
Name_01 | 7.56 | 1995
Name_02 | 8.96 | 1995
Name_03 | 3.25 | 2005
Name_04 | 4.36 | 2005
Name_05 | 7.52 | 2020
I need:
Name | Average | Year
Name_02 | 8.96 | 1995
Name_05 | 7.52 | 2020
Name_04 | 4.36 | 2005
I already did the group and the max. Here is my code:
db.foobar.aggregate([
{
$group: { _id: '$year_published', max: { $max: '$statistics.average' }}
},
{
$project: { _id: 1, max: 1 }
},
{
$sort: { max: -1 }
}
])
Which gives me this kind of result:
{
"result" : [
{
"_id" : 1999,
"max" : 8.0343000000000000
},
{
"_id" : 1985,
"max" : 7.8833299999999999
}
// An so on...
}
But I'd also like to project the primary name of the document related to the "max" to get something like:
{
"result" : [
{
"_id" : 1999,
"max" : 8.0343000000000000,
"name": "Foo Bar"
},
{
"_id" : 1985,
"max" : 7.8833299999999999,
"name": "Lorem Ipsum"
}
// An so on...
}
NB : The next part of the question add complexity regarding the name (because of my document structure). It's not my main concern right now, but I add it to the question to reflect all my problem.
The primary name is a bit tricky to get. For each document, I've got an array of objects like that:
{
"names" : [
{
"type" : "primary",
"value" : "Foo bar"
},
{
"type" : "alternate",
"value" : "Foo foo"
},
{
"type" : "alternate",
"value" : "Bar bar"
}
]
}
And what I'm trying to get is the name with "primary" type (i. e. "Foo bar" in my example).
Here is the structure of my documents:
{
"_id" : ObjectId("56338f2bdc99b8ec22a43328"),
"names" : [
{
"type" : "primary",
"value" : "Foo bar"
},
{
"type" : "alternate",
"value" : "Barr foo"
}
],
"year_published" : 1992
"statistics" : {
"average" : 6.6057699999999997
}
}
I think I'm not so far but I don't know how to do it... Could you please help me?
If you want the "paried" values out of a particular doccument with a "max" value then $max is not for you. Instead what you need to do is $sort the data first and then use the $first operator.
db.foobar.aggregate([
{ "$sort": { "year_published": 1, "statistics.average": -1 } },
{ "$group": {
"_id": "$year_published",
"max": { "$first": "$statistics.average" }},
"name": {
"$first": {
"$setDifference": [
{ "$map": {
"input": "$names",
"as": "name",
"in": {
"$cond": {
"if": { "$eq": [ "$$name.type", "primary" ] },
"then": "$$name.value",
"else": false
}
}
}},
[false]
]
}
}
}},
{ "$unwind": "$name" }
])
The $first and $last operators act on "grouping boundary" data. Which means they return data from the property that occurs either at the begining or end of the value that was used for the grouping _id.
That is why you "sort" first, so th documents are in order for selection.
By contrast $max and $min just pick the "max/min" value from anywhere in the documents in the sample. That's fine when it's all you want, but if you want "related" fields, then you must sort first.
That's the basics of it. The other part for dealing with filtering your array is most optimally done with the $map and $setDifference combination as shown. The $map allows testing of a condition via $cond on each array element "in-line", and returns the value depending on which is true or false. The result is still of course an array of equal length.
The $setDifference essentially filters out anything returned as false, so the only thing left should be the "primary". Still an array, which is why $unwind is still used, though it's only a single element array.
Future MongoDB versions will do this a little better with $filter and $arrayElemAt. Here's a glimpse:
db.foobar.aggregate([
{ "$sort": { "year_published": 1, "statistics.average": -1 } },
{ "$group": {
"_id": "$year_published",
"max": { "$first": "$statistics.average" }},
"name": {
"$first": {
"$arrayElemAt": [
{ "$filter": {
"input": "$names",
"as": "name",
"cond": {
"$eq": [ "$$name.type", "primary" ]
}
}},
0
]
}
}
}}
])
But none of this changes the basic rules of "sort first" and then just pick up the values from the grouping boundary.
Please try the below code :
You need to select the "name" filed in the group pipeline operation with the help of $First.
$First selects the value that results from applying an expression to the first document in a group of documents that share the same group by key.
db.foobar.aggregate([
{ "$unwind" : "$names" },
{ $match :
{ "$names.type" : "primary"}
} ,
{ $sort :
{ "year_published" : 1, "statistics.average" : -1 }
},
{ $group :
{ _id : "$year_published" ,
name : {
$first : "$names.value"
},
max: { $max: "$statistics.average" }
}
},
{ $sort:
{ max: -1 }
}
]).pretty();
This will give you the required result :
{
"result" : [
{
"_id" : 1999,
"max" : 8.0343000000000000,
"name": "Foo Bar"
},
{
"_id" : 1985,
"max" : 7.8833299999999999,
"name": "Lorem Ipsum"
}
// An so on...
}

Documents in MongoDB where last n sub-array elements contain a value

Consider this set of data in MongoDB...
{
_id: 1,
name: "Johnny",
properties: [
{
type: "A",
value: 257,
date: "4/1/2014"
},
{
type: "A",
value: 200,
date: "4/2/2014"
},
{
type: "B",
value: 301,
date: "4/3/2014"
},
...]
}
What is the proper way to query the the documents in which the one (or more of) last two "properties" elements have a value > x, or one (or more of) the last two "properties" elements of type "A" have a value > x?
If you can stomach modifying your insertion method try as follows;
Change your updates to push the following:
doc = { type : "A", "value" : 123, "date" : new Date() }
db.foo.update( {_id:1}, { "$push" : { "properties" : { "$each" : [ doc ], "$sort" : { date : -1} } } } )
This will give you an array of documents sorted in descending order by time, making the "most recent" document first.
You can now use the standard MongoDB dot notation to query against the 0, 1, etc elements of your properties array, which represent the most recent additions logically.
As per the comments, the aggregation framework is for a lot more than simply "aggregating" values, so you can take advantage of the various pipeline operators to do very advanced things that cannot be achieved simply using .find()
db.collection.aggregate([
// Match documents that "could" meet the conditions to narrow down
{ "$match": {
"properties": { "$elemMatch": {
"type": "A", "value": { "$gt": 200 }
}}
}},
// Keep a copy of the document for later with an array copy
{ "$project": {
"_id": {
"_id": "$_id",
"name": "$name",
"properties": "$properties"
},
"properties": 1
}},
// Unwind the array to "de-normalize"
{ "$unwind": "$properties" },
// Get the "last" element of the array and copy the existing one
{ "$group": {
"_id": "$_id",
"properties": { "$last": "$_id.properties" },
"last": { "$last": "$properties" },
"count": { "$sum": 1 }
}},
// Unwind the copy again
{ "$unwind": "$properties" },
// Project to mark the element you already have
{ "$project": {
"properties": 1,
"last": 1,
"count": 1,
"seen": { "$eq": [ "$properties", "$last" ] }
}},
// Match again, being careful to keep any array with one element only
// This gets rid of the element you already kept
{ "$match": {
"$or": [
{ "seen": false },
{ "seen": true, "count": 1 }
]
}},
// Group to get the second last element as "next"
{ "$group": {
"_id": "$_id",
"last": { "$last": "$last" },
"next": { "$last": "$properties" }
}},
// Then match to see if either of those elements fits
{ "$match": {
"$or": [
{ "last.type": "A", "last.value": { "$gt": 200 } },
{ "next.type": "A", "next.value": { "$gt": 200 } }
]
}},
// Finally restore your matching documents
{ "$project": {
"_id": "$_id._id",
"name": "$_id.name",
"properties": "$_id.properties"
}}
])
Running through that in a bit more detail:
The first $match usage is to make sure you are only working on documents that can "possibly" match your extended conditions. Always a good idea to optimize like this.
The next stage is to $project since you likely want to keep the original document detail and you are at least going to need the array again in order to get the second last element.
The next stages make use of $unwind in order to break the array into individual documents which is then followed by $group which is used to find the last item on the document _id boundary. This is actually the last item in the array. Plus you keep a count of the array elements.
So then after using $unwind again on the original array content, the usage of $project again adds a "seen" field to the document indicating via the use of the $eq operator whether or not the document from the original is actually the one that was previously keep as the "last" element.
After that stage you again issue a $match in order to filter that last document from the result, but also making sure in the condition that you are not removing anything that originally matched where the array length is actually 1.
From here you want to $group again in order to get the "second last" element from the array (or indeed the same "last" element where there was only one.
The final steps are simply to $match where either of those last two elements meets the conditions, and then finally $project the document in it's original form.
So while that is fairly involved and of course increases in complexity by the number of items you want to test at the end of the array it can be done, and shows how aggregate is very suited to the problem.
Where possible it is the best approach as invoking the JavaScript interpreter will convey an overhead compared to the native code used by aggregate.
Using mapReduce would remove the code complexity for taking the last two possible elements (or more) but it will invoke the JavaScript interpreter by nature and will therefore run much more slowly.
For the record, since the sample in the question would not be a match, here is some data that will match the last two documents, one of which only has one element in the array:
{
"_id" : 1,
"name" : "Johnny",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
},
{
"type" : "A",
"value" : 200,
"date" : "4/2/2014"
},
{
"type" : "B",
"value" : 301,
"date" : "4/3/2014"
}
]
}
{
"_id" : 2,
"name" : "Ace",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
},
{
"type" : "B",
"value" : 200,
"date" : "4/2/2014"
},
{
"type" : "B",
"value" : 301,
"date" : "4/3/2014"
}
]
}
{
"_id" : 3,
"name" : "Bo",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
}
]
}
{
"_id" : 4,
"name" : "Sue",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
},
{
"type" : "A",
"value" : 240,
"date" : "4/2/2014"
},
{
"type" : "B",
"value" : 301,
"date" : "4/3/2014"
}
]
}
Have you considered using a $where clause? Not the most efficient but I think it should get you what you want. For instance, if you wanted every document that had either the last two properties elements value field greater than 200 you could try:
db.collection.find({properties:{$exists:true},
$where: "(this.properties[this.properties.length-1].value > 200)||
(this.properties[this.properties.length-2].value > 200)"});
This needs some work for edge cases (array < 2 members for example) and more complex queries (by the "type" field too) but should get you started.

sort array in query and project all fields

I would like to sort a nested array at query time while also projecting all fields in the document.
Example document:
{ "_id" : 0, "unknown_field" : "foo", "array_to_sort" : [ { "a" : 3, "b" : 4 }, { "a" : 3, "b" : 3 }, { "a" : 1, "b" : 0 } ] }
I can perform the sorting with an aggregation but I cannot preserve all the fields I need. The application does not know at query time what other fields may appear in each document, so I am not able to explicitly project them. If I had a wildcard to project all fields then this would work:
db.c.aggregate([
{$unwind: "$array_to_sort"},
{$sort: {"array_to_sort.b":1, "array_to_sort:a": 1}},
{$group: {_id:"$_id", array_to_sort: {$push:"$array_to_sort"}}}
]);
...but unfortunately, it produces a result that does not contain the "unknown_field":
{
"_id" : 0,
"array_to_sort" : [
{
"a" : 1,
"b" : 0
},
{
"a" : 3,
"b" : 3
},
{
"a" : 3,
"b" : 4
}
]
}
Here is the insert command incase you would like to experiment:
db.c.insert({"unknown_field": "foo", "array_to_sort": [{"a": 3, "b": 4}, {"a": 3, "b":3}, {"a": 1, "b":0}]})
I cannot pre-sort the array because the sort criteria is dynamic. I may be sorting by any combination of a and/or b ascending/descending at query time. I realize I may need to do this in my client application, but it would be sweet if I could do it in mongo because then I could also $slice/skip/limit the results for paging instead of retrieving the entire array every time.
Since you are grouping on the document _id you can simply place the fields you wish to keep within the grouping _id. Then you can re-form using $project
db.c.aggregate([
{ "$unwind": "$array_to_sort"},
{ "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
{ "$group": {
"_id": {
"_id": "$_id",
"unknown_field": "$unknown_field"
},
"Oarray_to_sort": { "$push":"$array_to_sort"}
}},
{ "$project": {
"_id": "$_id._id",
"unknown_field": "$_id.unknown_field",
"array_to_sort": "$Oarray_to_sort"
}}
]);
The other "trick" in there is using a temporary name for the array in the grouping stage. This is so when you $project and change the name, you get the fields in the order specified in the projection statement. If you did not, then the "array_to_sort" field would not be the last field in the order, as it is copied from the prior stage.
That is an intended optimization in $project, but if you want the order then you can do it as above.
For completely unknown structures there is the mapReduce way of doing things:
db.c.mapReduce(
function () {
this["array_to_sort"].sort(function(a,b) {
return a.a - b.a || a.b - b.b;
});
emit( this._id, this );
},
function(){},
{ "out": { "inline": 1 } }
)
Of course that has an output format that is specific to mapReduce and therefore not exactly the document you had, but all the fields are contained under "values":
{
"results" : [
{
"_id" : 0,
"value" : {
"_id" : 0,
"some_field" : "a",
"array_to_sort" : [
{
"a" : 1,
"b" : 0
},
{
"a" : 3,
"b" : 3
},
{
"a" : 3,
"b" : 4
}
]
}
}
],
}
Future releases ( as of writing ) allow you to use a $$ROOT variable in aggregate to represent the document:
db.c.aggregate([
{ "$project": {
"_id": "$$ROOT",
"array_to_sort": "$array_to_sort"
}},
{ "$unwind": "$array_to_sort"},
{ "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
{ "$group": {
"_id": "$_id",
"array_to_sort": { "$push":"$array_to_sort"}
}}
]);
So there is no point there using the final "project" stage as you do not actually know the other fields in the document. But they will all be contained (including the original array and order ) within the _id field of the result document.