I'm trying to look up all records that match a certain condition, in this case _id being certain values, and then return only the top 2 results, sorted by the name field.
This is what I have
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {fk: 1, name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$project: {items: {$slice: ["$items", 2]} }}
])
and it works, BUT, it's not guaranteed. According to this Mongo thread $group does not guarantee document order.
This would also mean that all of the suggested solutions here and elsewhere, which recommend using $unwind, followed by $sort, and then $group, would also not work, for the same reason.
What is the best way to accomplish this with Mongo (any version)? I've seen suggestions that this could be accomplished in the $project phase, but I'm not quite sure how.
You are correct in saying that the result of $group is never sorted.
$group does not order its output documents.
Hence doing a;
{$sort: {fk: 1}}
then grouping with
{$group: {_id: "$fk", ... }},
will be a wasted effort.
But there is a silver lining with sorting before $group stage with name: -1. Since you are using $push (not an $addToSet), inserted objects will retain the order they've had in the newly created items array in the $group result. You can see this behaviour here (copy of your pipeline)
The items array will always have;
"items": [
{
..
"name": "Michael"
},
{
..
"name": "George"
}
]
in same order, therefore your nested array sort is a non-issue! Though I am unable to find an exact quote in documentation to confirm this behaviour, you can check;
this,
or this where it is confirmed.
Also, accumulator operator list for $group, where $addToSet has "Order of the array elements is undefined." in its description, whereas the similar operator $push does not, which might be an indirect evidence? :)
Just a simple modification of your pipeline where you move the fk: 1 sort from pre-$group stage to post-$group stage;
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$sort: {_id: 1}},
{$project: {items: {$slice: ["$items", 2]} }}
])
should be sufficient to have the main result array order fixed as well. Check it on mongoplayground
$group doesn't guarantee the document order but it would keep the grouped documents in the sorted order for each bucket. So in your case even though the documents after $group stage are not sorted by fk but each group (items) would be sorted by name descending. If you would like to keep the documents sorted by fk you could just add the {$sort:{fk:1}} after $group stage
You could also sort by order of values passed in your match query should you need by adding a extra field for each document. Something like
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$addField:{ifk:{$indexOfArray:[[1, 2],"$fk"]}}},
{$sort: {ifk: 1, name: -1}},
{$group: {_id: "$ifk", items: {$push: "$$ROOT"}}},
{$sort: {_id : 1}},
{$project: {items: {$slice: ["$items", 2]}}}
])
Update to allow array sort without group operator : I've found the jira which is going to allow sort array.
You could try below $project stage to sort the array.There maybe various way to do it. This should sort names descending.Working but a slower solution.
{"$project":{"items":{"$reduce":{
"input":"$items",
"initialValue":[],
"in":{"$let":{
"vars":{"othis":"$$this","ovalue":"$$value"},
"in":{"$let":{
"vars":{
//return index as 0 when comparing the first value with initial value (empty) or else return the index of value from the accumlator array which is closest and less than the current value.
"index":{"$cond":{
"if":{"$eq":["$$ovalue",[]]},
"then":0,
"else":{"$reduce":{
"input":"$$ovalue",
"initialValue":0,
"in":{"$cond":{
"if":{"$lt":["$$othis.name","$$this.name"]},
"then":{"$add":["$$value",1]},
"else":"$$value"}}}}
}}
},
//insert the current value at the found index
"in":{"$concatArrays":[
{"$slice":["$$ovalue","$$index"]},
["$$othis"],
{"$slice":["$$ovalue",{"$subtract":["$$index",{"$size":"$$ovalue"}]}]}]}
}}}}
}}}}
Simple example with demonstration how each iteration works
db.b.insert({"items":[2,5,4,7,6,3]});
othis ovalue index concat arrays (parts with counts) return value
2 [] 0 [],0 [2] [],0 [2]
5 [2] 0 [],0 [5] [2],-1 [5,2]
4 [5,2] 1 [5],1 [4] [2],-1 [5,4,2]
7 [5,4,2] 0 [],0 [7] [5,4,2],-3 [7,5,4,2]
6 [7,5,4,2] 1 [7],1 [6] [5,4,2],-3 [7,6,5,4,2]
3 [7,6,5,4,2] 4 [7,6,5,4],4 [3] [2],-1 [7,6,5,4,3,2]
Reference - Sorting Array with JavaScript reduce function
There is a bit of a red herring in the question as $group does guarantee that it will be processing incoming documents in order (and that's why you have to sort of them before $group to get an ordered arrays) but there is an issue with the way you propose doing it, since pushing all the documents into a single grouping is (a) inefficient and (b) could potentially exceed maximum document size.
Since you only want top two, for each of the unique fk values, the most efficient way to accomplish it is via a "subquery" using $lookup like this:
db.coll.aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$group:{_id:"$fk"}},
{$sort: {_id: 1}},
{$lookup:{
from:"coll",
as:"items",
let:{fk:"$_id"},
pipeline:[
{$match:{$expr:{$eq:["$fk","$$fk"]}}},
{$sort:{name:-1}},
{$limit:2},
{$project:{_id:0, fk:1, name:1}}
]
}}
])
Assuming you have an index on {fk:1, name:-1} as you must to get efficient sort in your proposed code, the first two stages here will use that index via DISTINCT_SCAN plan which is very efficient, and for each of them, $lookup will use that same index to filter by single value of fk and return results already sorted and limited to first two. This will be the most efficient way to do this at least until https://jira.mongodb.org/browse/SERVER-9377 is implemented by the server.
I have a mongo document that contains something like
{date: [2018, 3, 22]}
and when I try to project this into a flat JSON structure with these fields concatenated, I always get an array with 0 elements, eg. just extracting the year with
db.getCollection('blah').aggregate([
{$project: {year: "$date.0"}}
])
I get
{"year" : []}
even though matching on a similar expression works fine, eg.
db.getCollection('blah').aggregate([
{$match: {"$date.0": 2018}}
])
selects the documents I would expect just fine.
What am I doing wrong? I've searched mongo documentation and stackoverflow but could find nothing.
For $project you should use $arrayElemAt instead of dot notation which works only for queries.
db.getCollection('blah').aggregate([
{$project: {year: { $arrayElemAt: [ "$date", 0 ] }}}
])
More here
I have this huge dataset for which every entry has a datetime field. The data was inserted irregularly. For example:
2015-04-20 : 500 entries,
2015-04-23 : 300 entries,
2015-05-01 : 600 entries
The thing is, I do not know when these active days are. What I would like is a mongodb query which returns some sort of array containing all days which occur in the database, like so:
['2015-04-20,
'2015-04-23,
'2015-04-23,
'2015-04-25,
'2015-05-01,
'2015-05-05,
'2015-05-09]
Is this possible, and if so: how can I achieve this?
There is a "distinct" command that has shell wrapper, which can be used something like:
db.collection.distinct(dateFieldName, query)
If you are not running it from shell, check whether your driver wraps this command, if not you can use the command directly:
{ distinct: "<collection>", key: "<field>", query: <query> }
http://docs.mongodb.org/manual/reference/command/distinct/#dbcmd.distinct
If your time stamp field needs some additinal processing, you can use aggregation framework.
db.collection.aggregate([{$group: {_id: $substr: ["$timestamp", 0, 10]}}]
http://docs.mongodb.org/v2.6/core/aggregation-introduction/
Assuming a field named dateField that contains Date values, you can use the aggregation date operators with $group to do this.
It's easiest if you're using Mongo 3.x where the $dateToString operator is available:
db.dates.aggregate([
{$group: {
_id: {$dateToString: {format: '%Y-%m-%d', date: '$dateField'}},
count: {$sum: 1}
}},
{$sort: {count: -1}}
])
Prior to 3.0 you need to use multiple date operators to piece together the date into the _id when grouping:
db.dates.aggregate([
{$group: {
_id: {
year: {$year: '$dateField'},
month: {$month: '$dateField'},
day: {$dayOfMonth: '$dateField'}
},
count: {$sum: 1}
}},
{$sort: {count: -1}}
])
In both cases, note the use of $sort to order the results by the number of docs on each day, descending.
I am trying to group by DayHours in a mongo aggregate function to get the past 24 hours of data.
For example: if the time of an event was 6:00 Friday the "DayHour" would be 6-5.
I'm easily able to group by hour with the following query:
db.api_log.aggregate([
{ '$group': {
'_id': {
'$hour': '$time'
},
'count': {
'$sum':1
}
}
},
{ '$sort' : { '_id': -1 } }
])
I feel like there is a better way to do this. I've tried concatenation in the $project statement, however you can only concatenate strings in mongo(apparently).
I effectively just need to end up grouping by day and hour, however it gets done. Thank You.
I assume that time field contains ISODate.
If you want only last 24 hours you can use this:
var yesterday = new Date((new Date).setDate(new Date().getDate() - 1));
db.api_log.aggregate(
{$match: {time: {$gt: yesterday}}},
{$group: {
_id: {
hour: {$hour: "$time"},
day: {$dayOfMonth: "$time"},
},
count: {$sum: 1}
}}
)
If you want general grouping by day-hour you can use this:
db.api_log.aggregate(
{$group: {
_id: {
hour: {$hour: "$time"},
day: {$dayOfMonth: "$time"},
month: {$month: "$time"},
year: {$year: "$time"}
},
count: {$sum: 1}
}}
)
Also this is not an answer per se (I do not have mongodb now to come up with the answer), but I think that you can not do this just with aggregation framework (I might be wrong, so I will explain myself).
You can obtain date and time information from mongoId using .getTimestamp method. The problem that you can not output this information in mongo query (something like db.find({},{_id.getTimestamp}) does not work). You also can not search by this field (except of using $where clause).
So if it is possible to achieve, it can be done only using mapreduce, where in reduce function you group based on the output of getTimestamp.
If this is the query you are going to do quite often I would recommend actually adding date field to your document, because using this field you will be able properly aggregate your data and also you can use indeces not to scan all your collection (like you are doing with $sort -1, but to $match only the part which is bigger then current date - 24 hours).
I hope this can help even without a code. If no one will be able to answer this, I will try to play with it tomorrow.
Consider a collection which contains documents with a date and a count field :
{ _id: ObjectId("..."), date: ISODate("..."), count: 3}
I would like to query the count by week, so I have to group the data by a date truncated to the beginning of the week.
But it seems there is no way to achieve that with the mongodb aggregation framework.
I was expecting to be able to do something like this ($dateOfWeek is a date operator I imagined to truncate the date at the beginning of the week) :
db.data.aggregate( [ {$project : { date: {$dateOfWeek: '$date'}, count:1},
{ $group: {_id:'$date', count: {$sum: '$count'}} ])
But I didn't find a suitable date operator to do it.
I know I can do it with mapreduce but it would be so much more elegant to have a date operator rather than writing javascript code.