I have the following kind of docs in a collection in mongo db
{ _id:xx,
iddoc:yy,
type1:"sometype1",
type2:"sometype2",
date:
{
year:2015,
month:4,
day:29,
type:"day"
},
count:23
}
I would like to do a sum over the field count grouping by iddoc for all docs where:
type1 in ["type1A","type1B",...]
where type2 in ["type2A","type2B",...]
date.year: 2015,
date.month: 4,
date.type: "day"
date.day between 4 and 7
I would like then to sort these sums.
I think this is probably easy to do within mongo db aggregation framework but I am new to it and would appreciate a tip to get started.
This is straightforward to do with an aggregation pipeline:
db.test.aggregate([
// Filter the docs based on your criteria
{$match: {
type1: {$in: ['type1A', 'type1B']},
type2: {$in: ['type2A', 'type2B']},
'date.year': 2015,
'date.month': 4,
'date.type': 'day',
'date.day': {$gte: 4, $lte: 7}
}},
// Group by iddoc and count them
{$group: {
_id: '$iddoc',
sum: {$sum: 1}
}},
// Sort by sum, descending
{$sort: {sum: -1}}
])
If I understood you correctly:
db.col.aggregate
(
[{
$match:
{
type1: {$in: ["type1A", type1B",...]},
type2: {$in: ["type2A", type2B",...]},
"date.year": 2015,
"date.month": 4,,
"date.day": {$gte: 4, $lte: 7},
"date.type": "day"
}
},
{
$group:
{
_id: "$iddoc",
total_count: {$sum: "$count"}
}
},
{ $sort: {total_count: 1}}]
)
This is filtering the field date.day between 4 and 7 inclusive (if not, use $gt and $lt to exclude them). And it sorts results from lower to higher (ascending), if you want to do a descending sort, then:
{ $sort: {total_count: -1}}
Related
I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.
I'd like to sort a collection, then add a virtual property to the result which is their numerical order in which the results where displayed as.
So for example, we have a collection called calls, and we'd like to ascertain the current call queue priority as a number so it can be synced to our CRM via reverse ETL.
We have to do this inside of the query itself because we don't have an intermediary step where we can introduce any logic to determine this logic.
So my current query is
db.getCollection('callqueues').aggregate([
{
$match: {
'invalidated': false,
'assigned_agent': null
}
},
{ $sort: {
score: -1, _id: -1
} },
{
$addFields: {
order: "<NEW ORDER PROPERTY HERE>",
}
},
])
So I was wondering how would I insert as a virtual property their order, where the first element after the sort should be 1, second 2, etc
One option (since mongoDB version 5.0) is to use $setWindowFields for this:
db.collection.aggregate([
{$match: {invalidated: false, assigned_agent: null}},
{$setWindowFields: {
sortBy: {score: -1, _id: -1},
output: {
order: {
$sum: 1,
window: {documents: ["unbounded", "current"]}
}
}
}}
])
See how it works on the playground example
EDIT: If your mongoDB version is earlier than 5.0, you can use a less efficient query, involving $group and $unwind:
db.collection.aggregate([
{$match: {invalidated: false, assigned_agent: null}},
{$sort: {score: -1, _id: -1}},
{$group: {_id: 0, data: {$push: "$$ROOT"}}},
{$unwind: {path: "$data", includeArrayIndex: "order"}},
{$replaceRoot: {newRoot: {$mergeObjects: ["$data", {order: {$add: ["$order", 1]}}]}}}
])
See how it works on the playground example < 5.0
I have seen very similar questions with solutions to this problem, but I am unsure how I would incorporate it in to my own query. I'm programming in Scala and using a MongoDB Aggregates "framework".
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
// filter duplicates here ?
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "item", "content")))
)
The query returns duplicate objects which is undesirable. I would like to remove these. How could I go about incorporating Aggregates.group and "$addToSet" to do this? Or any other reasonable solution would be great too.
Note: I have to omit some details about the query, so the store lookup aggregate is not there. However, I want to remove the duplicates later in the query so it hopefully shouldn't matter.
Please let me know if I need to provide more information.
Thanks.
EDIT: 31/ 07/ 2019: 13:47
I have tried the following:
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
Aggregates.group("$item.itemID,
Accumulators.first("ID", "$ID"),
Accumulators.first("itemName", "$itemName"),
Accumulators.addToSet("item", "$item")
Aggregates.unwind("$items"),
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "items", "content")))
)
But my query now returns zero results instead of the duplicate result.
You can use $first to remove the duplicates.
Suppose I have the following data:
[
{"_id": 1,"item": "ABC","sizes": ["S","M","L"]},
{"_id": 2,"item": "EFG","sizes": []},
{"_id": 3, "item": "IJK","sizes": "M" },
{"_id": 4,"item": "LMN"},
{"_id": 5,"item": "XYZ","sizes": null
}
]
Now, let's aggregate it using $first and $unwind and see the difference:
First let's aggregate it using $first
db.collection.aggregate([
{ $sort: {
item: 1
}
},
{ $group: {
_id: "$item",firstSize: {$first: "$sizes"}}}
])
Output
[
{"_id": "XYZ","firstSize": null},
{"_id": "ABC","firstSize": ["S","M","L" ]},
{"_id": "IJK","firstSize": "M"},
{"_id": "EFG","firstSize": []},
{"_id": "LMN","firstSize": null}
]
Now, Let's aggregate it using $unwind
db.collection.aggregate([
{
$unwind: "$sizes"
}
])
Output
[
{"_id": 1,"item": "ABC","sizes": "S"},
{"_id": 1,"item": "ABC","sizes": "M"},
{"_id": 1,"item": "ABC","sizes": "L},
{"_id": 3,"item": "IJK","sizes": "M"}
]
You can see $first removes the duplicates where as $unwind keeps the duplicates.
Using $unwind and $first together.
db.collection.aggregate([
{ $unwind: "$sizes"},
{
$group: {
_id: "$item",firstSize: {$first: "$sizes"}}
}
])
Output
[
{"_id": "IJK", "firstSize": "M"},
{"_id": "ABC","firstSize": "S"}
]
group then addToSet is an effective way to deal with your problem !
it looks like this in mongoshell
db.sales.aggregate(
[
{
$group:
{
_id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } },
itemsSold: { $addToSet: "$item" }
}
}
]
)
in scala you can do it like
Aggregates.group("$groupfield", Accumulators.addToSet("fieldName","$expression"))
if you have multiple field to group
Aggregates.group(new BasicDBObject().append("fieldAname","$fieldA").append("fieldBname","$fieldB")), Accumulators.addToSet("fieldName","expression"))
then unwind
Following this question's answer (https://stackoverflow.com/a/20817040/2656506) I was able to group a field based on it's first character with this command:
db.kits.aggregate({ $group: {_id: {$substr: ['$kit', 0, 1]}, count: {$sum: 1}}})
But I can't figure out how I can additionally group only those documents which match an additional condition like _id: 'abc' in the same query. Can it be done in one query?
Thanks in advance!
add $match pipeline stage to your aggregation query:
db.kits.aggregate(
[
{
$match: {
_id: 'abc'
}
},
{
$group: {
_id: {
$substr: ['$kit', 0, 1]
},
count: {$sum: 1}
}
}
]
)
Im trying to get the count of certain items grouped on certain dates.
This is working using the following aggregate query:
// this query works, without matching dates
[
{'$match': {
'some_id': ObjectId('foobar'),
'some_boolean_value': true
}
},
{'$project':
{'day':
{'$substr': ['$some_date', 0, 10]}}
},
{'$group': {_id: '$day', count: { '$sum': 1 }}},
{'$sort': {_id: -1}}
]
The next step is that I want to use this query but with date limits.
I want the count, grouped per day, between certain date limits.
// the query below does not work as soon as date matching is added
// this query always return 0 documents
[
{'$match': {
'some_id': ObjectId('foobar'),
'some_boolean_value': true,
'some_date':
{
'$gte': '2015-08-01T00:00:00.000Z',
'$lte': '2015-08-31T23:59:59.999Z'
}
}
},
{'$project':
{'day':
{'$substr': ['$some_date', 0, 10]}}
},
{'$group': {_id: '$day', count: { '$sum': 1 }}},
{'$sort': {_id: -1}}
]
You want to filter documents and match only those in a specified datetime window. But you use string comparison instead of date comparison.
Therefore replace this:
'$gte': '2015-08-01T00:00:00.000Z',
'$lte': '2015-08-31T23:59:59.999Z'
with this:
'$gte': new Date('2015-08-01T00:00:00.000Z'),
'$lte': new Date('2015-08-31T23:59:59.999Z')