sort array in query and project all fields - mongodb

I would like to sort a nested array at query time while also projecting all fields in the document.
Example document:
{ "_id" : 0, "unknown_field" : "foo", "array_to_sort" : [ { "a" : 3, "b" : 4 }, { "a" : 3, "b" : 3 }, { "a" : 1, "b" : 0 } ] }
I can perform the sorting with an aggregation but I cannot preserve all the fields I need. The application does not know at query time what other fields may appear in each document, so I am not able to explicitly project them. If I had a wildcard to project all fields then this would work:
db.c.aggregate([
{$unwind: "$array_to_sort"},
{$sort: {"array_to_sort.b":1, "array_to_sort:a": 1}},
{$group: {_id:"$_id", array_to_sort: {$push:"$array_to_sort"}}}
]);
...but unfortunately, it produces a result that does not contain the "unknown_field":
{
"_id" : 0,
"array_to_sort" : [
{
"a" : 1,
"b" : 0
},
{
"a" : 3,
"b" : 3
},
{
"a" : 3,
"b" : 4
}
]
}
Here is the insert command incase you would like to experiment:
db.c.insert({"unknown_field": "foo", "array_to_sort": [{"a": 3, "b": 4}, {"a": 3, "b":3}, {"a": 1, "b":0}]})
I cannot pre-sort the array because the sort criteria is dynamic. I may be sorting by any combination of a and/or b ascending/descending at query time. I realize I may need to do this in my client application, but it would be sweet if I could do it in mongo because then I could also $slice/skip/limit the results for paging instead of retrieving the entire array every time.

Since you are grouping on the document _id you can simply place the fields you wish to keep within the grouping _id. Then you can re-form using $project
db.c.aggregate([
{ "$unwind": "$array_to_sort"},
{ "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
{ "$group": {
"_id": {
"_id": "$_id",
"unknown_field": "$unknown_field"
},
"Oarray_to_sort": { "$push":"$array_to_sort"}
}},
{ "$project": {
"_id": "$_id._id",
"unknown_field": "$_id.unknown_field",
"array_to_sort": "$Oarray_to_sort"
}}
]);
The other "trick" in there is using a temporary name for the array in the grouping stage. This is so when you $project and change the name, you get the fields in the order specified in the projection statement. If you did not, then the "array_to_sort" field would not be the last field in the order, as it is copied from the prior stage.
That is an intended optimization in $project, but if you want the order then you can do it as above.
For completely unknown structures there is the mapReduce way of doing things:
db.c.mapReduce(
function () {
this["array_to_sort"].sort(function(a,b) {
return a.a - b.a || a.b - b.b;
});
emit( this._id, this );
},
function(){},
{ "out": { "inline": 1 } }
)
Of course that has an output format that is specific to mapReduce and therefore not exactly the document you had, but all the fields are contained under "values":
{
"results" : [
{
"_id" : 0,
"value" : {
"_id" : 0,
"some_field" : "a",
"array_to_sort" : [
{
"a" : 1,
"b" : 0
},
{
"a" : 3,
"b" : 3
},
{
"a" : 3,
"b" : 4
}
]
}
}
],
}
Future releases ( as of writing ) allow you to use a $$ROOT variable in aggregate to represent the document:
db.c.aggregate([
{ "$project": {
"_id": "$$ROOT",
"array_to_sort": "$array_to_sort"
}},
{ "$unwind": "$array_to_sort"},
{ "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
{ "$group": {
"_id": "$_id",
"array_to_sort": { "$push":"$array_to_sort"}
}}
]);
So there is no point there using the final "project" stage as you do not actually know the other fields in the document. But they will all be contained (including the original array and order ) within the _id field of the result document.

Related

Mongodb - group by same value in different fields in different documents

I have documents with common values in different fields that I want to group by that value. Simplified records are:
{ _id:1,
"Home" : "A",
"Away" : "B" }
{ _id:2,
"Home" : "B",
"Away" : "C" }
{ _id:3,
"Home" : "C",
"Away" : "A" }
{ _id:4,
"Home" : "C",
"Away" : "B" }
{ _id:5,
"Home" : "A",
"Away" : "C" }
I am trying to get an aggregate group result that includes, for example, the value "A" whether it appears in a document in the field "Home", or the field "Away". The result I want is:
{"_id": "A", "count": 3},
{"_id": "B", "count": 3},
{"_id": "C", "count": 4}
Grouping by either "Home" or "Away" is no problem but that wouldn't give me all the records, as shown below, I wouldn't get a count of records where "A" or "B" or "C" was in the "Home" field:
{$group:
{_id: "$Away"} etc... }
I have tried using $cond from other posts here as follows:
$group : {
_id : {
$cond : [{
$gt : [ "$Away", null]
}, "$Home"]
}
}
Also tried an $or which is pretty obviously wrong since it will only find the same value for Away and Home fields within each document (which is never the case):
$group : {
_id : {
$or : [ "$Away", "$Home"]
}
}
I'm stuck and not sure if this is even possible; to group on a value that may be in different fields in different documents.
You can create an object to use $objectToArray and $unwind and then group like this:
Create object using $set and the same values ($Home and $Away)
Use project to not pass these values to the next stage. There are no neccesary, you have the object.
Then $objectToArray to do $unwind and get every value
And last $group by property v generated by $objectToArray.
db.collection.aggregate([
{
"$set": {
"obj": {
"Home": "$Home",
"Away": "$Away"
}
}
},
{
"$project": {"Away": 0,"Home": 0}
},
{
"$set": {"obj": {"$objectToArray": "$obj"}}
},
{
"$unwind": "$obj"
},
{
"$group": {
"_id": "$obj.v",
"count": {"$sum": 1}
}
}
])
Example here

how to query records which field A before field B in the doc

Does MongoDB support query like this?
for example I have data like this
> db.foo.find()
{ "_id" : 1, "x" : 1, "y" : 2, "z" : 3 }
{ "_id" : 2, "y" : 2, "x" : 1, "z" : 3 }
{ "_id" : 3, "z" : 3, "y" : 2, "x" : 1 }
now I want to query the records which field y before field x, that is the last two records.
Does MongoDB support it?
You can use following aggregation:
db.foo.aggregate([
{
$addFields: {
keys: {
$map: {
input: { $objectToArray: "$$ROOT" },
as: "item",
in: "$$item.k"
}
}
}
},
{
$match: {
$expr: { $lt: [ { $indexOfArray: [ "$keys", "y" ] } , { $indexOfArray: [ "$keys", "x" ] } ] }
}
},
{
$project: {
keys: 0
}
}
])
$objectToArray can transform your root document to an array of key-value pairs. Then you can use $indexOfArray to get the position of x and y keys and compare them using $expr.
Two things you need to be aware of (based on this page):
Updates that include renaming of field names may result in the reordering of fields in the document.
Starting in version 2.6, MongoDB actively attempts to preserve the field order in a document. Before version 2.6, MongoDB did not actively preserve the order of the fields in a document.

Database sort on numeric values that are actually strings

I have below query. In that field_a is String property and field_b is an array of type Number. I want an array having property field_a and field_b with unique combination. Here field_a contains numeric value but in string format. So I want to apply natural sort in aggregation pipeline. $natural can be used only with such query db.collection.find().sort( { $natural: 1 } )
So how can I use natural sort in MongoDB or I need to depend upon JS functions or on lodash/underscore.js ?
db.collection.aggregate([
{"$group": { "_id": { field_a: "$field_a", field_b: "$field_b" } } },
{ $project: { a: "$_id" } },
{"$group": {"_id": 'a', "res": {"$addToSet": "$_id" }}},
{"$unwind": "$res"},
{"$sort": { "res": 1}},
{"$group": { "_id": null, "res": {"$push": "$res" }}}
])
I short, this is what you want to do here:
db.collection.aggregate([
{ "$group": {
"_id": {
"field_a": {
"$concat": [
{ "$substrCP": [
"0000000000",
0,
{ "$subtract": [ 10, { "$strLenCP": "$field_a" } ] }
]},
"$field_a"
]
},
"field_b": "$field_b"
}
}},
{ "$sort": { "_id": 1 } }
])
Explanation
As a basic concept, the problem you have is that "strings" sort in in a way that does not translate to how numeric sorts work.
As a brief example, these documents use a string value:
{ "_id" : ObjectId("5928276f84c3559bc2fd458b"), "a" : "5" }
{ "_id" : ObjectId("5928277484c3559bc2fd458c"), "a" : "50" }
{ "_id" : ObjectId("5928277e84c3559bc2fd458d"), "a" : "60" }
{ "_id" : ObjectId("5928278284c3559bc2fd458e"), "a" : "6" }
If you try to sort these, then the lexical order applies:
> db.list.find().sort({ "a": 1 })
{ "_id" : ObjectId("5928276f84c3559bc2fd458b"), "a" : "5" }
{ "_id" : ObjectId("5928277484c3559bc2fd458c"), "a" : "50" }
{ "_id" : ObjectId("5928278284c3559bc2fd458e"), "a" : "6" }
{ "_id" : ObjectId("5928277e84c3559bc2fd458d"), "a" : "60" }
As strings, this makes sense. Since a "50" begins with "5" and is therefore less than the "6".
Since the aggregation framework cannot "cast" these as numeric values, then your only option is to present the "strings" in a way in which they will order lexically in the same way as they would with numeric values.
In brief terms we "zero pad" them, which is essentially making the values fixed length strings which are prefixed or "padded" with 0 which makes the "strings" appear to be ordered like they would be numerically:
db.list.aggregate([
{ "$project": {
"a": {
"$concat": [
{ "$substrCP": [
"0000000000",
0,
{ "$subtract": [ 10, { "$strLenCP": "$a" } ] }
]},
"$a"
]
}
}},
{ "$sort": { "a": 1 } }
])
And this will produce a list in order like:
{ "_id" : ObjectId("5928276f84c3559bc2fd458b"), "a" : "0000000005" }
{ "_id" : ObjectId("5928278284c3559bc2fd458e"), "a" : "0000000006" }
{ "_id" : ObjectId("5928277484c3559bc2fd458c"), "a" : "0000000050" }
{ "_id" : ObjectId("5928277e84c3559bc2fd458d"), "a" : "0000000060" }
The basic premise here is that you take a "template" string, which is in this example a string of 0 which is 10 characters long. Then we look at the length of the field data to transform using $strLenCP and $subtract that length from 10 which is the length of the template string used here.
The difference in length is fed to the $substrCP operator as the number of characters to take from the template. This output is then fed to $concat in order to make a "string" which like the template is 10 characters long, but starts with zeros and ends in the initial numeric string.
In your actual usage, this will end up as part of a composite key. Yet since the transformed key is first in the key order, then simply sorting by _id considers this and primarily sorts by values in the first key and then the second.

Documents in MongoDB where last n sub-array elements contain a value

Consider this set of data in MongoDB...
{
_id: 1,
name: "Johnny",
properties: [
{
type: "A",
value: 257,
date: "4/1/2014"
},
{
type: "A",
value: 200,
date: "4/2/2014"
},
{
type: "B",
value: 301,
date: "4/3/2014"
},
...]
}
What is the proper way to query the the documents in which the one (or more of) last two "properties" elements have a value > x, or one (or more of) the last two "properties" elements of type "A" have a value > x?
If you can stomach modifying your insertion method try as follows;
Change your updates to push the following:
doc = { type : "A", "value" : 123, "date" : new Date() }
db.foo.update( {_id:1}, { "$push" : { "properties" : { "$each" : [ doc ], "$sort" : { date : -1} } } } )
This will give you an array of documents sorted in descending order by time, making the "most recent" document first.
You can now use the standard MongoDB dot notation to query against the 0, 1, etc elements of your properties array, which represent the most recent additions logically.
As per the comments, the aggregation framework is for a lot more than simply "aggregating" values, so you can take advantage of the various pipeline operators to do very advanced things that cannot be achieved simply using .find()
db.collection.aggregate([
// Match documents that "could" meet the conditions to narrow down
{ "$match": {
"properties": { "$elemMatch": {
"type": "A", "value": { "$gt": 200 }
}}
}},
// Keep a copy of the document for later with an array copy
{ "$project": {
"_id": {
"_id": "$_id",
"name": "$name",
"properties": "$properties"
},
"properties": 1
}},
// Unwind the array to "de-normalize"
{ "$unwind": "$properties" },
// Get the "last" element of the array and copy the existing one
{ "$group": {
"_id": "$_id",
"properties": { "$last": "$_id.properties" },
"last": { "$last": "$properties" },
"count": { "$sum": 1 }
}},
// Unwind the copy again
{ "$unwind": "$properties" },
// Project to mark the element you already have
{ "$project": {
"properties": 1,
"last": 1,
"count": 1,
"seen": { "$eq": [ "$properties", "$last" ] }
}},
// Match again, being careful to keep any array with one element only
// This gets rid of the element you already kept
{ "$match": {
"$or": [
{ "seen": false },
{ "seen": true, "count": 1 }
]
}},
// Group to get the second last element as "next"
{ "$group": {
"_id": "$_id",
"last": { "$last": "$last" },
"next": { "$last": "$properties" }
}},
// Then match to see if either of those elements fits
{ "$match": {
"$or": [
{ "last.type": "A", "last.value": { "$gt": 200 } },
{ "next.type": "A", "next.value": { "$gt": 200 } }
]
}},
// Finally restore your matching documents
{ "$project": {
"_id": "$_id._id",
"name": "$_id.name",
"properties": "$_id.properties"
}}
])
Running through that in a bit more detail:
The first $match usage is to make sure you are only working on documents that can "possibly" match your extended conditions. Always a good idea to optimize like this.
The next stage is to $project since you likely want to keep the original document detail and you are at least going to need the array again in order to get the second last element.
The next stages make use of $unwind in order to break the array into individual documents which is then followed by $group which is used to find the last item on the document _id boundary. This is actually the last item in the array. Plus you keep a count of the array elements.
So then after using $unwind again on the original array content, the usage of $project again adds a "seen" field to the document indicating via the use of the $eq operator whether or not the document from the original is actually the one that was previously keep as the "last" element.
After that stage you again issue a $match in order to filter that last document from the result, but also making sure in the condition that you are not removing anything that originally matched where the array length is actually 1.
From here you want to $group again in order to get the "second last" element from the array (or indeed the same "last" element where there was only one.
The final steps are simply to $match where either of those last two elements meets the conditions, and then finally $project the document in it's original form.
So while that is fairly involved and of course increases in complexity by the number of items you want to test at the end of the array it can be done, and shows how aggregate is very suited to the problem.
Where possible it is the best approach as invoking the JavaScript interpreter will convey an overhead compared to the native code used by aggregate.
Using mapReduce would remove the code complexity for taking the last two possible elements (or more) but it will invoke the JavaScript interpreter by nature and will therefore run much more slowly.
For the record, since the sample in the question would not be a match, here is some data that will match the last two documents, one of which only has one element in the array:
{
"_id" : 1,
"name" : "Johnny",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
},
{
"type" : "A",
"value" : 200,
"date" : "4/2/2014"
},
{
"type" : "B",
"value" : 301,
"date" : "4/3/2014"
}
]
}
{
"_id" : 2,
"name" : "Ace",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
},
{
"type" : "B",
"value" : 200,
"date" : "4/2/2014"
},
{
"type" : "B",
"value" : 301,
"date" : "4/3/2014"
}
]
}
{
"_id" : 3,
"name" : "Bo",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
}
]
}
{
"_id" : 4,
"name" : "Sue",
"properties" : [
{
"type" : "A",
"value" : 257,
"date" : "4/1/2014"
},
{
"type" : "A",
"value" : 240,
"date" : "4/2/2014"
},
{
"type" : "B",
"value" : 301,
"date" : "4/3/2014"
}
]
}
Have you considered using a $where clause? Not the most efficient but I think it should get you what you want. For instance, if you wanted every document that had either the last two properties elements value field greater than 200 you could try:
db.collection.find({properties:{$exists:true},
$where: "(this.properties[this.properties.length-1].value > 200)||
(this.properties[this.properties.length-2].value > 200)"});
This needs some work for edge cases (array < 2 members for example) and more complex queries (by the "type" field too) but should get you started.

MongoDB $sort usage

This is my database/document.
Running:
db.Students.find().pretty()
Result is:
{
"_id" : 1,
"scores" : [
{
"attempt" : 1,
"score" : 5
},
{
"attempt" : 2,
"score" : 10
},
{
"attempt" : 3,
"score" : 7
},
{
"attempt" : 4,
"score" : 9
}
]
}
How to display the scores in descending order using $sort ?
Well you cannot do that using .find() as any .sort() modifier there is actually sorting the documents and not the contents of your array. But you can do that using .aggregate():
db.Students.aggregate([
// Unwind the array to de-normalize
{ "$unwind": "$scores" },
// Sort the documents with the scores descending
{ "$sort": { "_id": 1, "scores.score": -1 } },
// Group back to an array
{ "$group": {
"_id": "$_id",
"scores": { "$push": "$scores" }
}}
])
So once all the elements are "de-normalized" into individual documents, the $sort pipeline stage takes care of re-arranging the order.