Sort and Group in one MongoDB aggregation query - mongodb

Using $sort and $group in one aggregation query behaving strangely.
Test data:
db.createCollection("test");
db.test.insert({
ts : 100,
category : 1
});
db.test.insert({
ts : 80,
category : 1
});
db.test.insert({
ts : 60,
category : 2
});
db.test.insert({
ts : 40,
category : 3
});
So when sorting it by ts all looks good, but when I use both $sort and $group result goes in a wrong order. Query:
db.test.aggregate([
{
$sort : {ts: 1}
},
{
$group:{"_id":"$category"}
}
]);
And the result in reverse order:
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
Is it Mongo feature or my misunderstanding? Maby mongo firstly applied grouping and then can't sort by absent field. For this reason probably mongoose prohibits use distinct with sorting.

You need to first $group and $sort the result. Since you only want the _id field you will need the $project stage.
db.test.aggregate(
[
{ "$group": { "_id": "$category" }},
{ "$sort" : { "ts": 1 }},
{ "$project": { "_id": 1 }}
]
);

If you want to sort the other way, do it like this:
db.test.aggregate([
{
$sort : {ts: -1}
},
{
$group:{"_id":"$category"}
}
]);
Notice the - in front of the 1.

When you first $sort by ts, you are basically sorting all the elements from your collection. Thus, if you were to only run the $sort stage in the aggregation pipeline, you would get the following result:
//Query
db.test.aggregate([
{ $sort: { ts: 1} }
]);
//Output
{ "_id" : ObjectId("55141da6e4c260ae9e00832b"), "ts" : 40, "category" : 3 }
{ "_id" : ObjectId("55141d9fe4c260ae9e00832a"), "ts" : 60, "category" : 2 }
{ "_id" : ObjectId("55141d99e4c260ae9e008329"), "ts" : 80, "category" : 1 }
{ "_id" : ObjectId("55141d93e4c260ae9e008328"), "ts" : 100, "category" : 1 }
In your code, when you add the $group stage, you are basically grouping the above results by the category field, producing the output that you get:
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
In the end it all depends on what you are trying to achieve.
If you want to return the categories filtered by the ts field, you should only use the $sort stage and then manipulate the resulting data set:
var data = db.test.aggregate([
{$sort: { ts: 1}},
{$project: {
_id: 0,
ts: 1,
category: 1
}
}
]).toArray();
for(var i = 0; i < data.length; i++) {
console.log(data[i].category); //Output 3, 2, 1 in that sequence, on different lines
}

Related

Add field to documents after $sort aggregation pipeline which include its index in sorted list using MongoDb aggregation

I want to get the order of some user from a list after $sort aggregation pipeline.
Let's say we have a leaderboard, and I need to get my rank in the leaderboard with only one query getting only my data.
I have tried $addFields and some queries with $map
Let's say we have these documents
/* 1 createdAt:8/18/2019, 4:42:41 PM*/
{
"_id" : ObjectId("5d5963e1c6c93b2da849f067"),
"name" : "x4",
"points" : 69
},
/* 2 createdAt:8/18/2019, 4:42:41 PM*/
{
"_id" : ObjectId("5d5963e1c6c93b2da849f07b"),
"name" : "x24",
"points" : 968
},
/* 3 createdAt:8/18/2019, 4:42:41 PM*/
{
"_id" : ObjectId("5d5963e1c6c93b2da849f06a"),
"name" : "x7",
"points" : 997
},
And I want to write a query like this
db.table.aggregate(
[
{ $sort : { points : 1 } },
{ $addFields: { order : "$index" } },
{ $match : { name : "x24" } }
]
)
I need to inject the order field with something like $index
I expect to have something like this in return
{
"_id" : ObjectId("5d5963e1c6c93b2da849f07b"),
"name" : "x24",
"points" : 968,
"order" : 2
}
I need something like the metadata of the result here which return 2
/* 2 createdAt:8/18/2019, 4:42:41 PM*/
One of the workaround for this situation is to convert your all documents into one single array and hence resolve the index of the document using this array with help of $unwind and finally project the data with fields as required.
db.collection.aggregate([
{ $sort: { points: 1 } },
{
$group: {
_id: 1,
register: { $push: { _id: "$_id", name: "$name", points: "$points" } }
}
},
{ $unwind: { path: "$register", includeArrayIndex: "order" } },
{ $match: { "register.name": "x4" } },
{
$project: {
_id: "$register._id",
name: "$register.name",
points: "$register.points",
order: 1
}
}
]);
To make it more efficient you can apply limit, match, and filter as per your requirement.

Calculate average of ratings in array, then add field to original document in MongoDB

I have a documents that have a field called ratings. This is an array of objects, each object containing userId and ratingValue
ratings: Array
0: Object
userId: "uidsample1"
ratingValue: 5
1: Object
userId:"uidsample2"
ratingValue:1.5
I want to do an aggregation pipeline to calculate the new average when one of the ratings in the array is updated or added. Then, I want to put that value in the document as a new field called averageRating.
I have tried unwinding, then $ add field of $avg : "ratings.ratingValue" but it adds to the unwinded documents and doesnt get the average. It looks something like this (not exactly since testing on compass)
db.test.aggregate{
[
{
$unwind: {
path: "$ratings"
}
},
{
$addFields {
averageRating: {
$avg: "$ratings.ratingValue"
}
}
}
]
}
What's a good query structure for this ?
you don't actually need to $unwind and $group to calculate the average, these operations are costly
you can simply $addFields with $avg
db.col.aggregate([
{$addFields : {averageRating : {$avg : "$ratings.ratingValue"}}}
])
sample collection and aggregation
> db.t62.drop()
true
> db.t62.insert({data : {ratings : [{val : 1}, {val : 2}]}})
WriteResult({ "nInserted" : 1 })
> db.t62.find()
{ "_id" : ObjectId("5c44d9719d56bf65be5ab2e6"), "data" : { "ratings" : [ { "val" : 1 }, { "val" : 2 } ] } }
> db.t62.aggregate([{$addFields : {avg : {$avg : "$data.ratings.val"}}}])
{ "_id" : ObjectId("5c44d9719d56bf65be5ab2e6"), "data" : { "ratings" : [ { "val" : 1 }, { "val" : 2 } ] }, "avg" : 1.5 }
Use $group after $unwind as below to calculate the averageRating. Aggregate is a read operation. You need to update the doc afterward.
[
{
'$unwind': {
'path': '$ratings'
}
}, {
'$group': {
'_id': '$_id',
'averageRating': {
'$avg': '$ratings.ratingValue'
}
}
}
]

Mongo aggregation framework: group users by age

I have a user base stored in mongo. Users may record their date of birth.
I need to run a report aggregating users by age.
I now have a pipeline that groups users by year of birth. However, that is not precise enough because most people are not born on January 1st; so even if they are born in, say, 1970, they may well not be 43 yet.
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"YearOfBirth" : {$year : "$DateOfBirth"} } },
{ $group : { _id : "$YearOfBirth", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
Do you know if it's possible to perform some kind of arithmetic within the aggregation framework to exactly calculate the age of a user? Or is this possible with MapReduce only?
It seems like the whole thing is possible with the new Mongo 2.4 version just released, supporting additional Date operations (namely the "$subtract").
Here's how I did it:
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"ageInMillis" : {$subtract : [new Date(), "$DateOfBirth"] } } },
{ $project : {"age" : {$divide : ["$ageInMillis", 31558464000] }}},
// take the floor of the previous number:
{ $project : {"age" : {$subtract : ["$age", {$mod : ["$age",1]}]}}},
{ $group : { _id : "$age", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
There are not enough dateTime operators and math operators to project out the date. But you might be able to create age ranges by composing a dynamic query:
Define your date ranges as cut-off dates as
dt18 = today - 18
dt25 = today - 25
...
dt65 = today - 65
Then do nested conditionals, where you progressively use the cut off dates as age group markers, like so:
db.folks.save({ "_id" : 1, "bd" : ISODate("2000-02-03T00:00:00Z") });
db.folks.save({ "_id" : 2, "bd" : ISODate("2010-06-07T00:00:00Z") });
db.folks.save({ "_id" : 3, "bd" : ISODate("1990-10-20T00:00:00Z") });
db.folks.save({ "_id" : 4, "bd" : ISODate("1964-09-23T00:00:00Z") });
db.folks.aggregate(
{
$project: {
ageGroup: {
$cond: [{
$gt: ["$bd",
ISODate("1995-03-19")]
},
"age0_18",
{
$cond: [{
$gt: ["$bd",
ISODate("1988-03-19")]
},
"age18_25",
"age25_plus"]
}]
}
}
},
{
$group: {
_id: "$ageGroup",
count: {
$sum: 1
}
}
})

Mongodb $match and $project aggregation

I have this very simple set of documents.
> db.ysTest.aggregate({$project:{_id:1,unitStatus:1}});
{
"result" : [
{
"_id" : ObjectId("514309f3e18aa7d14100217a"),
"unitStatus" : "es_pws"
},
{
"_id" : ObjectId("514309f3e18aa7d141002816"),
"unitStatus" : "es_run"
},
{
"_id" : ObjectId("514309f0e18aa7d14100021e")
}
],
"ok" : 1
}
When use 'aggregate' using $match & $project, i expect 1 document but i get them all.
note: I'm using aggregate because this is going to be part of a more complicated match, but i tried to keep it simple for this example.
> db.ysTest.aggregate({
... $match: {
... unitStatus: {$exists: true, $nin: ["es_pws", "es_stl"]}
... },
... $project: {_id: 1,unitStatus:1}
... });
{
"result" : [
{
"_id" : ObjectId("514309f3e18aa7d14100217a"),
"unitStatus" : "es_pws"
},
{
"_id" : ObjectId("514309f3e18aa7d141002816"),
"unitStatus" : "es_run"
},
{
"_id" : ObjectId("514309f0e18aa7d14100021e")
}
],
"ok" : 1
}
What am i doing wrong ?
By looking at your document, query and the comments it is clear that you're not using $group operator and $match is simply a select clause which filter the result based on your given criteria. in your case
... $match: {
... unitStatus: {$exists: true, $nin: ["es_pws", "es_stl"]}
... }
But $match and $group doesn't guarantee that it will return one document. what guarantee is your schema, query criteria.

Obtaining $group result with group count

Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)