MongoDB $sort usage - mongodb

This is my database/document.
Running:
db.Students.find().pretty()
Result is:
{
"_id" : 1,
"scores" : [
{
"attempt" : 1,
"score" : 5
},
{
"attempt" : 2,
"score" : 10
},
{
"attempt" : 3,
"score" : 7
},
{
"attempt" : 4,
"score" : 9
}
]
}
How to display the scores in descending order using $sort ?

Well you cannot do that using .find() as any .sort() modifier there is actually sorting the documents and not the contents of your array. But you can do that using .aggregate():
db.Students.aggregate([
// Unwind the array to de-normalize
{ "$unwind": "$scores" },
// Sort the documents with the scores descending
{ "$sort": { "_id": 1, "scores.score": -1 } },
// Group back to an array
{ "$group": {
"_id": "$_id",
"scores": { "$push": "$scores" }
}}
])
So once all the elements are "de-normalized" into individual documents, the $sort pipeline stage takes care of re-arranging the order.

Related

MongoDB aggregate nested array correctly

OK I am very new to Mongo, and I am already stuck.
Db has the following structure (much simplified for sure):
{
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"score" : "2",
"value" : "AA",
},
{
"score" : "2",
"value" : "AA",
},
{
"score" : "4",
"value" : "BB",
},
{
"score" : "3",
"value" : "CC",
}
]
},
{
"_id" : ObjectId("57fdfbc12dc30a46507044ef"),
"keyterms" : [
...
There are some Objects. Each Object have an array "keywords". Each of this Arrays Entries, which have score and value. There are some duplicates though (not really, since in the real db the keywords entries have much more fields, but concerning value and score they are duplicates).
Now I need a query, which
selects one object by id
groups its keyterms in by value
and counts the dublicates
sorts them by score
So I want to have something like that as result
// for Object 57fdfbc12dc30a46507044ec
"keyterms"; [
{
"score" : "4",
"value" : "BB",
"count" : 1
},
{
"score" : "3",
"value" : "CC",
"count" : 1
}
{
"score" : "2",
"value" : "AA",
"count" : 2
}
]
In SQL I would have written something like this
select
score, value, count(*) as count
from
all_keywords_table_or_some_join
group by
value
order by
score
But, sadly enough, it's not SQL.
In Mongo I managed to write this:
db.getCollection('tests').aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$_id",
'keyterms': {$push: "$keyterms"}
}},
{$project: {
'keyterms.score': 1,
'keyterms.value': 1
}}
])
But there is something missing: the grouping of the the keywords by their value. I can not get rid of the feeling, that this is the wrong approach at all. How can I select the keywords array and continue with that, and use an aggregate function inly on this - that would be easy.
BTW I read this
(Mongo aggregate nested array)
but I can't figure it out for my example unfortunately...
You'd want an aggregation pipeline where after you $unwind the array, you group the flattened documents by the array's value and score keys, aggregate the counts using the $sum accumulator operator and retain the main document's _id with the $first operator.
The preceding pipeline should then group the documents from the previous pipeline by the _id key so as to preserve the original schema and recreate the keyterms array using the $push operator.
The following demonstration attempts to explain the above aggregation operation:
db.tests.aggregate([
{ "$match": { "_id": ObjectId("57fdfbc12dc30a46507044ec") } },
{ "$unwind": "$keyterms" },
{
"$group": {
"_id": {
"value": "$keyterms.value",
"score": "$keyterms.score"
},
"doc_id": { "$first": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$sort": {"_id.score": -1 } },
{
"$group": {
"_id": "$doc_id",
"keyterms": {
"$push": {
"value": "$_id.value",
"score": "$_id.score",
"count": "$count"
}
}
}
}
])
Sample Output
{
"_id" : ObjectId("57fdfbc12dc30a46507044ec"),
"keyterms" : [
{
"value" : "BB",
"score" : "4",
"count" : 1
},
{
"value" : "CC",
"score" : "3",
"count" : 1
},
{
"value" : "AA",
"score" : "2",
"count" : 2
}
]
}
Demo
Meanwhile, I solved it myself:
aggregate([
{$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
{$unwind: "$keyterms"},
{$sort: {"keyterms.score": -1}},
{$group: {
'_id': "$keyterms.value",
'keyterms': {$push: "$keyterms"},
'escore': {$first: "$keyterms.score"},
'evalue': {$first: "$keyterms.value"}
}},
{$limit: 15},
{$project: {
"score": "$escore",
"value": "$evalue",
"count": {$size: "$keyterms"}
}}
])

Combining group and project in mongoDB aggregation framework

my document looks like this:
{
"_id" : ObjectId("5748d1e2498ea908d588b65e"),
"some_item" : {
"_id" : ObjectId("5693afb1b49eb7d5ed97de14"),
"item_property_1" : 1.0,
"item_property_2" : 2.0,
},
"timestamp" : "2016-05-28",
"price_information" : {
"arbitrary_value" : 111,
"hourly_rates" : [
{
"price" : 74.45,
"hour" : "0"
},
{
"price" : 74.45,
"hour" : "1"
},
{
"price" : 74.45,
"hour" : "2"
},
]
}
}
I did average the price per day via:
db.hourly.aggregate([
{$match: {timestamp : "2016-05-28"}},
{$unwind: "$price_information.hourly_rates"},
{$group: { _id: "$unique_item_identifier", total_price: { $avg: "$price_information.hourly_rates.price"}}}
]);
I am struggling with bringing (projecting) other params with in the result set. I would like to have also some_item and timestampin the result set. I tried to use a $project: {some_item: 1, total_price: 1, ...} within the query, but that wasn't right.
My desired output would be like:
{
"_id" : ObjectId("5693afb1b49eb7d5ed97de27"),
"someItem" : {
"_id" : ObjectId("5693afb1b49eb7d5ed97de14"),
"item_property_1" : 1.0,
"item_property_2" : 2.0,
},
"timestamp" : "2016-05-28",
"price_information" : {
"avg_price": 34
}
}
If somebody could give me a hint, how to project the grouping and the other params into the result set, I would be thankful.
Best
Rob
If using MongoDB 3.2 and newer, you can use $avg in the $project pipeline since it returns the average of the specified expression or list of expressions for each document e.g
db.hourly.aggregate([
{ "$match": { "timestamp": "2016-05-28" } },
{
"$project": {
"price_information": {
"avg_price": { "$avg": "$price_information.hourly_rates.price" }
},
"someItem": 1,
"timestamp": 1,
}
}
]);
In previous versions of MongoDB, $avg is available in the $group stage only. So to include the other fields, use the $first operator in your grouping:
db.hourly.aggregate([
{ "$match": { "timestamp": "2016-05-28" } },
{ "$unwind": "$price_information.hourly_rates" },
{
"$group": {
"_id": "$_id",
"avg_price": { "$avg": "$price_information.hourly_rates.price" },
"someItem": { "$first": "$some_item" },
"timestamp": { "$first": "$timestamp" },
}
},
{
"$project": {
"price_information": { "avg_price": "$avg_price" },
"someItem": 1
"timestamp": 1
}
}
]);
Note: Usage of the $first operator in a $group stage will largely depend on how the documents getting in that pipeline are ordered as well as the group by key. Because $first will returns the first document value in a group of documents that share the same group by key, the $group stage logically should precede a $sort stage to have the input documents in a defined order. This is only sensible to use when you know the order that the data is being processed in.
However, as the above is grouping by the main document's _id key, the $first operator when applied to non-denormalized fields (and not the flattened price_information array fields) will guarantee the original value in the result. Hence no need for a pre-sort stage to define the order since it won't be necessary in this case.

Limiting Query result in MongoDB

I have 20,000+ documents in my mongodb. I just learnt that you cannot query them all in one go.
So my question is this:
I want to get my document using find(query) then limit its results for 3 documents only and I can choose where those documents start from.
For example if my find() query resulted in 8 documents :
[{doc1}, {doc2}, {doc3}, {doc4}, {doc5}, {doc6}, {doc7}, {doc 8}]
command limit(2, 3) will gives [doc3, doc4, doc5]
And I also need to get total count for all that result(without limit) for example : length() will give 8 (the number of total document resulted from find() function)
Any suggestion? Thanks
add .skip(2).limit(3) to the end of your query
I suppose you have the following documents in your collection.
{ "_id" : ObjectId("56801243fb940e32f3221bc2"), "a" : 0 }
{ "_id" : ObjectId("56801243fb940e32f3221bc3"), "a" : 1 }
{ "_id" : ObjectId("56801243fb940e32f3221bc4"), "a" : 2 }
{ "_id" : ObjectId("56801243fb940e32f3221bc5"), "a" : 3 }
{ "_id" : ObjectId("56801243fb940e32f3221bc6"), "a" : 4 }
{ "_id" : ObjectId("56801243fb940e32f3221bc7"), "a" : 5 }
{ "_id" : ObjectId("56801243fb940e32f3221bc8"), "a" : 6 }
{ "_id" : ObjectId("56801243fb940e32f3221bc9"), "a" : 7 }
From MongoDB 3.2 you can use the .aggregate() method and the $slice operator.
db.collection.aggregate([
{ "$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" }
}},
{ "$project": {
"count": 1,
"_id": 0,
"docs": { "$slice": [ "$docs", 2, 3 ] }
}}
])
Which returns:
{
"count" : 8,
"docs" : [
{
"_id" : ObjectId("56801243fb940e32f3221bc4"),
"a" : 2
},
{
"_id" : ObjectId("56801243fb940e32f3221bc5"),
"a" : 3
},
{
"_id" : ObjectId("56801243fb940e32f3221bc6"),
"a" : 4
}
]
}
You may want to sort your document before grouping using the $sort operator.
From MongoDB 3.0 backwards you will need to first $group your documents and use the $sum accumulator operator to return the "count" of documents; also in that same group stage you need to use the $push and the $$ROOT variable to return an array of all your documents. The next stage in the pipeline will then be the $unwind stage where you denormalize that array. From there use use the $skip and $limit operators respectively skip the first 2 documents and passes 3 documents to the next stage which is another $group stage.
db.collection.aggregate([
{ "$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" }
}},
{ "$unwind": "$docs" },
{ "$skip": 2 },
{ "$limit": 3 },
{ "$group": {
"_id": "$_id",
"count": { "$first": "$count" },
"docs": { "$push": "$docs" }
}}
])
As #JohnnyHK pointed out in this comment
$group is going to read all documents and build a 20k element array with them just to get three docs.
You should then run two queries using find()
db.collection.find().skip(2).limit(3)
and
db.collection.count()

sort array in query and project all fields

I would like to sort a nested array at query time while also projecting all fields in the document.
Example document:
{ "_id" : 0, "unknown_field" : "foo", "array_to_sort" : [ { "a" : 3, "b" : 4 }, { "a" : 3, "b" : 3 }, { "a" : 1, "b" : 0 } ] }
I can perform the sorting with an aggregation but I cannot preserve all the fields I need. The application does not know at query time what other fields may appear in each document, so I am not able to explicitly project them. If I had a wildcard to project all fields then this would work:
db.c.aggregate([
{$unwind: "$array_to_sort"},
{$sort: {"array_to_sort.b":1, "array_to_sort:a": 1}},
{$group: {_id:"$_id", array_to_sort: {$push:"$array_to_sort"}}}
]);
...but unfortunately, it produces a result that does not contain the "unknown_field":
{
"_id" : 0,
"array_to_sort" : [
{
"a" : 1,
"b" : 0
},
{
"a" : 3,
"b" : 3
},
{
"a" : 3,
"b" : 4
}
]
}
Here is the insert command incase you would like to experiment:
db.c.insert({"unknown_field": "foo", "array_to_sort": [{"a": 3, "b": 4}, {"a": 3, "b":3}, {"a": 1, "b":0}]})
I cannot pre-sort the array because the sort criteria is dynamic. I may be sorting by any combination of a and/or b ascending/descending at query time. I realize I may need to do this in my client application, but it would be sweet if I could do it in mongo because then I could also $slice/skip/limit the results for paging instead of retrieving the entire array every time.
Since you are grouping on the document _id you can simply place the fields you wish to keep within the grouping _id. Then you can re-form using $project
db.c.aggregate([
{ "$unwind": "$array_to_sort"},
{ "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
{ "$group": {
"_id": {
"_id": "$_id",
"unknown_field": "$unknown_field"
},
"Oarray_to_sort": { "$push":"$array_to_sort"}
}},
{ "$project": {
"_id": "$_id._id",
"unknown_field": "$_id.unknown_field",
"array_to_sort": "$Oarray_to_sort"
}}
]);
The other "trick" in there is using a temporary name for the array in the grouping stage. This is so when you $project and change the name, you get the fields in the order specified in the projection statement. If you did not, then the "array_to_sort" field would not be the last field in the order, as it is copied from the prior stage.
That is an intended optimization in $project, but if you want the order then you can do it as above.
For completely unknown structures there is the mapReduce way of doing things:
db.c.mapReduce(
function () {
this["array_to_sort"].sort(function(a,b) {
return a.a - b.a || a.b - b.b;
});
emit( this._id, this );
},
function(){},
{ "out": { "inline": 1 } }
)
Of course that has an output format that is specific to mapReduce and therefore not exactly the document you had, but all the fields are contained under "values":
{
"results" : [
{
"_id" : 0,
"value" : {
"_id" : 0,
"some_field" : "a",
"array_to_sort" : [
{
"a" : 1,
"b" : 0
},
{
"a" : 3,
"b" : 3
},
{
"a" : 3,
"b" : 4
}
]
}
}
],
}
Future releases ( as of writing ) allow you to use a $$ROOT variable in aggregate to represent the document:
db.c.aggregate([
{ "$project": {
"_id": "$$ROOT",
"array_to_sort": "$array_to_sort"
}},
{ "$unwind": "$array_to_sort"},
{ "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
{ "$group": {
"_id": "$_id",
"array_to_sort": { "$push":"$array_to_sort"}
}}
]);
So there is no point there using the final "project" stage as you do not actually know the other fields in the document. But they will all be contained (including the original array and order ) within the _id field of the result document.

Mongodb Aggregation grouping with leave the field

After applying the aggregation
db.grades.aggregate([
{$match: {'type': 'homework'}},
{$sort: {'student_id':1, 'score':1}}
])
got the result:
{
"result" : [
{
"_id" : ObjectId("50906d7fa3c412bb040eb579"),
"student_id" : 0,
"type" : "homework",
"score" : 14.8504576811645
},
{
"_id" : ObjectId("50906d7fa3c412bb040eb57a"),
"student_id" : 0,
"type" : "homework",
"score" : 63.98402553675503
},
...
How to modify the request to leave documents with a minimum value score and get a result which kept the field id. For example, in such a way:
{
"_id" : ObjectId("50906d7fa3c412bb040eb579"),
"score" : 14.8504576811645
}
Thanks.
Is this a homework question from the education site? I can't remember, but this is fairly trivial.
db.grades.aggregate([
{ "$match": { type: 'homework' } },
{ "$sort": {student_id: 1, score: 1} },
{ "$group": {
"_id": "$student_id",
"doc": { "$first": "$_id"},
"score": { "$first": "$score"}
}},
{ "$sort: { "_id": 1 } },
{ "$project": {
"_id": "$doc",
"score": 1
}}
])
All this does is use $first to get the first result when grouping by student_id. By first it means exactly that, so this is only useful after sorting and is different from $min which would take the smallest value from the grouped results.
So if you got part of the way there, not only do you keep the first score, but you also do the same operation on the _id value as well.
The additional sort is only there so the results don't trip you up, because they are likely to appear in the reverse order of student_id. Finally there is just a small use of $project to get the document form that you want.