MongoDB - Safely sort inner array after group - mongodb

I'm trying to look up all records that match a certain condition, in this case _id being certain values, and then return only the top 2 results, sorted by the name field.
This is what I have
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {fk: 1, name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$project: {items: {$slice: ["$items", 2]} }}
])
and it works, BUT, it's not guaranteed. According to this Mongo thread $group does not guarantee document order.
This would also mean that all of the suggested solutions here and elsewhere, which recommend using $unwind, followed by $sort, and then $group, would also not work, for the same reason.
What is the best way to accomplish this with Mongo (any version)? I've seen suggestions that this could be accomplished in the $project phase, but I'm not quite sure how.

You are correct in saying that the result of $group is never sorted.
$group does not order its output documents.
Hence doing a;
{$sort: {fk: 1}}
then grouping with
{$group: {_id: "$fk", ... }},
will be a wasted effort.
But there is a silver lining with sorting before $group stage with name: -1. Since you are using $push (not an $addToSet), inserted objects will retain the order they've had in the newly created items array in the $group result. You can see this behaviour here (copy of your pipeline)
The items array will always have;
"items": [
{
..
"name": "Michael"
},
{
..
"name": "George"
}
]
in same order, therefore your nested array sort is a non-issue! Though I am unable to find an exact quote in documentation to confirm this behaviour, you can check;
this,
or this where it is confirmed.
Also, accumulator operator list for $group, where $addToSet has "Order of the array elements is undefined." in its description, whereas the similar operator $push does not, which might be an indirect evidence? :)
Just a simple modification of your pipeline where you move the fk: 1 sort from pre-$group stage to post-$group stage;
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$sort: {_id: 1}},
{$project: {items: {$slice: ["$items", 2]} }}
])
should be sufficient to have the main result array order fixed as well. Check it on mongoplayground

$group doesn't guarantee the document order but it would keep the grouped documents in the sorted order for each bucket. So in your case even though the documents after $group stage are not sorted by fk but each group (items) would be sorted by name descending. If you would like to keep the documents sorted by fk you could just add the {$sort:{fk:1}} after $group stage
You could also sort by order of values passed in your match query should you need by adding a extra field for each document. Something like
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$addField:{ifk:{$indexOfArray:[[1, 2],"$fk"]}}},
{$sort: {ifk: 1, name: -1}},
{$group: {_id: "$ifk", items: {$push: "$$ROOT"}}},
{$sort: {_id : 1}},
{$project: {items: {$slice: ["$items", 2]}}}
])
Update to allow array sort without group operator : I've found the jira which is going to allow sort array.
You could try below $project stage to sort the array.There maybe various way to do it. This should sort names descending.Working but a slower solution.
{"$project":{"items":{"$reduce":{
"input":"$items",
"initialValue":[],
"in":{"$let":{
"vars":{"othis":"$$this","ovalue":"$$value"},
"in":{"$let":{
"vars":{
//return index as 0 when comparing the first value with initial value (empty) or else return the index of value from the accumlator array which is closest and less than the current value.
"index":{"$cond":{
"if":{"$eq":["$$ovalue",[]]},
"then":0,
"else":{"$reduce":{
"input":"$$ovalue",
"initialValue":0,
"in":{"$cond":{
"if":{"$lt":["$$othis.name","$$this.name"]},
"then":{"$add":["$$value",1]},
"else":"$$value"}}}}
}}
},
//insert the current value at the found index
"in":{"$concatArrays":[
{"$slice":["$$ovalue","$$index"]},
["$$othis"],
{"$slice":["$$ovalue",{"$subtract":["$$index",{"$size":"$$ovalue"}]}]}]}
}}}}
}}}}
Simple example with demonstration how each iteration works
db.b.insert({"items":[2,5,4,7,6,3]});
othis ovalue index concat arrays (parts with counts) return value
2 [] 0 [],0 [2] [],0 [2]
5 [2] 0 [],0 [5] [2],-1 [5,2]
4 [5,2] 1 [5],1 [4] [2],-1 [5,4,2]
7 [5,4,2] 0 [],0 [7] [5,4,2],-3 [7,5,4,2]
6 [7,5,4,2] 1 [7],1 [6] [5,4,2],-3 [7,6,5,4,2]
3 [7,6,5,4,2] 4 [7,6,5,4],4 [3] [2],-1 [7,6,5,4,3,2]
Reference - Sorting Array with JavaScript reduce function

There is a bit of a red herring in the question as $group does guarantee that it will be processing incoming documents in order (and that's why you have to sort of them before $group to get an ordered arrays) but there is an issue with the way you propose doing it, since pushing all the documents into a single grouping is (a) inefficient and (b) could potentially exceed maximum document size.
Since you only want top two, for each of the unique fk values, the most efficient way to accomplish it is via a "subquery" using $lookup like this:
db.coll.aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$group:{_id:"$fk"}},
{$sort: {_id: 1}},
{$lookup:{
from:"coll",
as:"items",
let:{fk:"$_id"},
pipeline:[
{$match:{$expr:{$eq:["$fk","$$fk"]}}},
{$sort:{name:-1}},
{$limit:2},
{$project:{_id:0, fk:1, name:1}}
]
}}
])
Assuming you have an index on {fk:1, name:-1} as you must to get efficient sort in your proposed code, the first two stages here will use that index via DISTINCT_SCAN plan which is very efficient, and for each of them, $lookup will use that same index to filter by single value of fk and return results already sorted and limited to first two. This will be the most efficient way to do this at least until https://jira.mongodb.org/browse/SERVER-9377 is implemented by the server.

Related

Mongo aggregator query: get list of documents depending on condition

I am working with 1.000 documets in a collection in MongoDb. Each document can be made of many topics, and a topic can be made of many keywords.
The mongo structure for each document is the following:
_id:ObjectId(6d5fc0922982bb550e08502d),
id_doc:"1234-678-436-42"
topic:Array
keywords:Array
The key topic is an array of objects o this type
type:"topic"
label:"work"
While, the keywords key is an array of objects, very similar to "topic":
type:"keyword"
value:"programmer"
label:"work"
Label represents in both cases the topic of the doc!
What I want is to list all the documents (id_doc) where a topic appears in the "topic" array, but never in the "keyword" array.
Query
takes the intersection of topic.label with keywords.label
if empty then no common members, so document passes the filter
*not sure if you want this, if not if you can give 1-2 documents in json text, and the expected output
Playmongo
aggregate(
[{"$match":
{"$expr":
{"$eq":
[{"$setIntersection": ["$topic.label", "$keywords.label"]}, []]}}}])
One option is using $filter:
count the number of items in topic which their label is not present in as a value of an item on keywords. Save this count as condMatch
$match only document with condMatch greater than 0
db.collection.aggregate([
{$set: {condMatch: {
$size: {
$filter: {
input: "$topic",
cond: {$not: {$in: ["$$this.label", "$keywords.value"]}}
}
}
}
}
},
{$match: {condMatch: {$gt: 0}}},
{$unset: "condMatch"}
])
See how it works on the playground example

How MongoDB Handles $addFields on Array Elements that Do Not Exist

If I use $addFields as an aggregation stage for an element that doesn't actually exist, how will Mongo handle that?
I ask because, in my data, for one particular field that is an array, some documents have 1 array element, and others have 2 or 3 (so position 0, 1, and 2).
So if I do this:
$addFields : {
secondAgency: {$arrayElemAt:["$agency", 1]},
}
How will Mongo handle documents where there is no element in position 1 of the agency array? Is there a way I could handle this with a conditional check -- to only include this field if there is an array element in position 1 of the array?
It'll simply ignore the value and won't add such field. To add a default value, use $ifNull:
db.getCollection('x').aggregate([
{$addFields: {
secondAgency1: {$arrayElemAt: ["$agency", 1]},
secondAgency2: {$ifNull: [{$arrayElemAt: ["$agency", 1]}, "DEFAULT"]}
}}
])

How do I project an element of an array in mongo?

I have a mongo document that contains something like
{date: [2018, 3, 22]}
and when I try to project this into a flat JSON structure with these fields concatenated, I always get an array with 0 elements, eg. just extracting the year with
db.getCollection('blah').aggregate([
{$project: {year: "$date.0"}}
])
I get
{"year" : []}
even though matching on a similar expression works fine, eg.
db.getCollection('blah').aggregate([
{$match: {"$date.0": 2018}}
])
selects the documents I would expect just fine.
What am I doing wrong? I've searched mongo documentation and stackoverflow but could find nothing.
For $project you should use $arrayElemAt instead of dot notation which works only for queries.
db.getCollection('blah').aggregate([
{$project: {year: { $arrayElemAt: [ "$date", 0 ] }}}
])
More here

Meteor + Mongo (2.6.7) Pushing Document to Array in Sorted Order

I have a document with an array (which should be denormalised, but can't be because the reactive events will fire "add" too many times at client startup).
I need to be able to push a document to that array, and keep it in sorted (or roughly sorted) order. I've tried this query:
{ $push: {
'events': {
$each: [{'id': new Mongo.ObjectID, 'start':startDate,...}],
$sort: {'start': 1},
$slice: -1
}
}
But it requires the $slice operator to be present... I don't want to delete all my old data, I just want to be able to insert data into an array, and then have that array be sorted so that I can query the array later and say "slice greater than or equal to time X".
Is this possible?
Edit:
This mongo aggregate query nearly works, except for one level of document in the result array, but aggregating is not reactive (probably because they're expensive computations). Here is the aggregate query if anyone can see how to translate it to a find, or why it can't be translated:
Coll.aggregate({$unwind: '$events'},
{$sort: {'events.start':1}},
{$match: {'events.start': {$gte: new Date()}}},
{$group: {_id: '$_id', 'events': {$push: '$events'} }})

Mongodb aggregate query help - grouping with multiple fields and converting to an array

I have the following document in the mongodb collection
[{quarter:'Q1',project:'project1',user:'u1',cost:'100'},
{quarter:'Q2',project:'project1',user:'u2',cost:'100'},
{quarter:'Q3',project:'project1',user:'u1',cost:'200'},
{quarter:'Q1',project:'project2',user:'u2',cost:'200'},
{quarter:'Q2',project:'project2',user:'u1',cost:'300'},
{quarter:'Q3',project:'project2',user:'u2',cost:'300'}]
i need to generate an output which will sum the cost based on quarter and project and put it in the format so that it can be rendered in the Extjs chart.
[{quarter:'Q1','project1':100,'project2':200,'project3':300},
{quarter:'Q2','project1':100,'project2':200,'project3':300},
{quarter:'Q3','project1':100,'project2':200,'project3':300}]
i have tried various permutations and combinations of aggregates but couldnt really come up with a pipeline. your help or direction is greatly appreciated
Your cost data appears to be strings, which isn't helping, but assuming you're around that:
The main component is the $cond operator in the document projection, and assuming your data is larger and you want to group the results:
db.mstats.aggregate([
// Optionaly match first depending on what you are doing
// Sum up cost for each quarter and project
{$group: {_id: { quarter: "$quarter", project: "$project" }, cost: {$sum: "$cost" }}},
// Change the "projection" in $group, using $cond to add a key per "project" value
// We use $sum and the false case of 0 to fill in values not in the row.
// These will then group on the key adding the real cost and 0 together.
{$group: {
_id: "$_id.quarter",
project1: {$sum: {$cond:[ {$eq: [ "$_id.project", "project1" ]}, "$cost", 0 ]}},
project2: {$sum: {$cond:[ {$eq: [ "$_id.project", "project2" ]}, "$cost", 0 ]}}
}},
// Change the document to have the "quarter" key
{$project: { _id:0, quarter: "$_id", project1: 1, project2: 1}},
// Optionall sort by quarter
{$sort: {quarter: 1 }}
])
So after doing the initial grouping the document is altered with use of $cond to determine if the value of a key is going to go into the new key name. Essentially this asks if the current value of project is "project1" then put the cost value into this project1 key. And so on.
As we put a 0 value into this new document key when there was no match, we need to group the results again in order to merge the documents. Sorting is optional, but probably what you want for a chart.
Naturally you will have to build this up dynamically and probably query for the project keys that you want. But otherwise this should be what you are looking for.