Meteor MongoDB aggregate - Collapsing certain values to 'Others' - mongodb

I am working on a toy project with Meteor and MongoDB to learn how to use them.
The MongoDB documents have a pretty simple structure:
{athlete: "Michael Jordan", brand: "Nike"}
{athlete: "Shaquille O'Neal", brand: "Reebok"}
etc..
I want to publish the number of athletes associated with a given brand to a D3.js method to create a piechart. This is easy to do.
The trouble I am having is with collecting brands below a certain threshold (in the example below, brands with only one athlete) into a single row to pass to the client. For example, if the brand distribution is:
{_id: "Nike", count:4} {_id: "Reebok",count:5} {_id: "Puma",count:3}{_id:"Adidas",count:1} {_id:"New Balance", count:1} {_id:"Under Armour", count:1}
I want the output to be:
{_id: "Nike", count:4} {_id: "Reebok",count:5} {_id: "Puma",count:3}{_id:"Others",count:3}
Here is the code I have tried:
pipeline = [
{$group:{
_id: "$brand",
count: {$sum: 1}
}},
{$project:{
_id:{
$cond:[ { $eq: [ "$count", 1 ] }, 'Others', "$_id" ]
},
count:{
$cond: [ { $eq: [ "$count", 1 ] },{$sum:"$count"}, "$count"]
}
}}
];
The _id condition does collapse the single entry brands into 'Others'. If the _id attribute is the only thing passed, I can see the expected results in the browser console (List of brands with more than 1 athletes and Others).
To get the count, I have tried three things:
Setting the true condition in the count if to {$sum:"$count"} as
above
Setting the true condition in the count if to {$sum:1}
Setting count to a non-conditional {$sum:"$count"}
All three result in an empty array being passed to the client. Any ideas on how to get this to work?
Solution:
Setting the pipeline as follows does what I described as desired behaviour in the previous section.
pipeline = [
{$group:{
_id: "$brand",
count: {$sum: 1}
}},
{$group:{
_id:{
$cond:[ { $eq: [ "$count", 1 ] }, 'Others', "$_id"]
},
count:{$sum:"$count"}
}}
];

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

find missing elements from the passed array to mongodb qyery

for example
animals = ['cat','mat','rat'];
collection contains only 'cat' and 'mat'
I want the query to return 'rat' which is not there in collection..
collection contains
[
{
_id:objectid,
animal:'cat'
},
{
_id:objectid,
animal:'mat'
}
]
db.collection.find({'animal':{$nin:animals}})
(or)
db.collection.find({'animal':{$nin:['cat','mat','rat']}})
EDIT:
One option is:
Use $facet to $group all existing values to a set. using $facet allows to continue even if the db is empty, as #leoll2 mentioned.
$project with $cond to handle both cases: with or without data.
Find the set difference
db.collection.aggregate([
{$facet: {data: [{$group: {_id: 0, animals: {$addToSet: "$animal"}}}]}},
{$project: {
data: {
$cond: [{$gt: [{$size: "$data"}, 0]}, {$first: "$data"}, {animals: []}]
}
}},
{$project: {data: "$data.animals"}},
{$project: {_id: 0, missing: {$setDifference: [animals, "$data"]}}}
])
See how it works on the playground example - with data or playground example - without data

Sort element with property: true to the top, but only one out of many

My app can search through a database of resources using MongoDB's aggregation pipeline. Some of these documents have the property sponsored: true.
I want to move exactly one of these sponsored entries to the top of the search results, but keep natural ordering up for the remaining ones (no matter if sponsored or not).
Below is my code. My idea was to make use of addFields but change the logic so that it only applies to the first element that meets the condition. Is this possible?
[...]
const aggregationResult = await Resource.aggregate()
.search({
compound: {
must: [
[...]
],
should: [
[...]
]
}
})
[...]
//only do this for the first sponsored result
.addFields({
selectedForSponsoredSlot: { $cond: [{ $eq: ['$sponsored', true] }, true, false] }
})
.sort(
{
selectedForSponsoredSlot: -1,
_id: 1
}
)
.facet({
results: [
{ $match: matchFilter },
{ $skip: (page - 1) * pageSize },
{ $limit: pageSize },
],
totalResultCount: [
{ $match: matchFilter },
{ $group: { _id: null, count: { $sum: 1 } } }
],
[...]
})
.exec();
[...]
Update:
One option is to change your $facet a bit:
You can get the $match out of the $facet since it is relevant to all pipelines.
instead of two pipelines, one for the results and one for the counting, we have now three: one more for sponsored documents only.
remove items that were already seen previously according to the sposerted item relevance score.
remove the item that is in the sponserd array from the allDocs array (if it is in this page).
$slice the allDocs array to be in the right size to complete the sponsered items to the wanted pageSize
$project to concatenate sponsored and allDocs docs
db.collection.aggregate([
{$sort: {relevance: -1, _id: 1}},
{$match: matchFilter},
{$facet: {
allDocs: [{$skip: (page - 1) * (pageSize - 1)}, {$limit: pageSize + 1 }],
sposerted: [{$match: {sponsored: true}}, {$limit: 1}],
count: [{$count: "total"}]
}},
{$set: {
allDocs: {
$slice: [
"$allDocs",
{$cond: [{$gte: [{$first: "$sposerted.relevance"},
{$first: "$allDocs.relevance"}]}, 1, 0]},
pageSize + 1
]
}
}},
{$set: {
allDocs: {
$filter: {
input: "$allDocs",
cond: {$not: {$in: ["$$this._id", "$sposerted._id"]}}
}
}
}},
{$set: {allDocs: {$slice: ["$allDocs", 0, (pageSize - 1)]}}},
{$project: {
results: {
$concatArrays: [ "$sposerted", "$allDocs"]},
totalResultCount: {$first: "$count.total"}
}}
])
See how it works on the playground example

Scala / MongoDB - removing duplicate

I have seen very similar questions with solutions to this problem, but I am unsure how I would incorporate it in to my own query. I'm programming in Scala and using a MongoDB Aggregates "framework".
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
// filter duplicates here ?
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "item", "content")))
)
The query returns duplicate objects which is undesirable. I would like to remove these. How could I go about incorporating Aggregates.group and "$addToSet" to do this? Or any other reasonable solution would be great too.
Note: I have to omit some details about the query, so the store lookup aggregate is not there. However, I want to remove the duplicates later in the query so it hopefully shouldn't matter.
Please let me know if I need to provide more information.
Thanks.
EDIT: 31/ 07/ 2019: 13:47
I have tried the following:
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
Aggregates.group("$item.itemID,
Accumulators.first("ID", "$ID"),
Accumulators.first("itemName", "$itemName"),
Accumulators.addToSet("item", "$item")
Aggregates.unwind("$items"),
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "items", "content")))
)
But my query now returns zero results instead of the duplicate result.
You can use $first to remove the duplicates.
Suppose I have the following data:
[
{"_id": 1,"item": "ABC","sizes": ["S","M","L"]},
{"_id": 2,"item": "EFG","sizes": []},
{"_id": 3, "item": "IJK","sizes": "M" },
{"_id": 4,"item": "LMN"},
{"_id": 5,"item": "XYZ","sizes": null
}
]
Now, let's aggregate it using $first and $unwind and see the difference:
First let's aggregate it using $first
db.collection.aggregate([
{ $sort: {
item: 1
}
},
{ $group: {
_id: "$item",firstSize: {$first: "$sizes"}}}
])
Output
[
{"_id": "XYZ","firstSize": null},
{"_id": "ABC","firstSize": ["S","M","L" ]},
{"_id": "IJK","firstSize": "M"},
{"_id": "EFG","firstSize": []},
{"_id": "LMN","firstSize": null}
]
Now, Let's aggregate it using $unwind
db.collection.aggregate([
{
$unwind: "$sizes"
}
])
Output
[
{"_id": 1,"item": "ABC","sizes": "S"},
{"_id": 1,"item": "ABC","sizes": "M"},
{"_id": 1,"item": "ABC","sizes": "L},
{"_id": 3,"item": "IJK","sizes": "M"}
]
You can see $first removes the duplicates where as $unwind keeps the duplicates.
Using $unwind and $first together.
db.collection.aggregate([
{ $unwind: "$sizes"},
{
$group: {
_id: "$item",firstSize: {$first: "$sizes"}}
}
])
Output
[
{"_id": "IJK", "firstSize": "M"},
{"_id": "ABC","firstSize": "S"}
]
group then addToSet is an effective way to deal with your problem !
it looks like this in mongoshell
db.sales.aggregate(
[
{
$group:
{
_id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } },
itemsSold: { $addToSet: "$item" }
}
}
]
)
in scala you can do it like
Aggregates.group("$groupfield", Accumulators.addToSet("fieldName","$expression"))
if you have multiple field to group
Aggregates.group(new BasicDBObject().append("fieldAname","$fieldA").append("fieldBname","$fieldB")), Accumulators.addToSet("fieldName","expression"))
then unwind

How to find out key-value which matches to given value for a week using mongo?

I have following collection structure -
[{
"runtime":1417510501850,
"vms":[{
"name":"A",
"state":"on",
},{
"name":"B",
"state":"off",
}]
},
{
"runtime":1417510484000,
"vms":[{
"name":"A",
"state":"on",
}, {
"name":"B",
"state":"off",
}]
},{
"runtime":1417510184000,
"vms":[{
"name":"A",
"state":"off",
}, {
"name":"B",
"state":"off",
}]
},{
"runtime":1417509884000,
"vms":[{
"name":"A",
"state":"on",
}, {
"name":"B",
"state":"off",
}]
},{
"runtime":1416905084000,
"vms":[{
"name":"A",
"state":"on",
}, {
"name":"B",
"state":"off",
}]
}
]
The difference between these two documents is 5 minutes which is represented by 'runtime'.
I have many such documents.
I want to find names whose state is off for a week. The only condition is state should be off through out week (should not have single value 'on' for key state).
e.g. In above data, if name 'B' is off from one week (by considering 1417510501850 as current timestamp), then my expected output will be -
{
"name":"B",
"state":"off"
}
Currently I am doing following-
1) find documents with state 'off' which are greater than 1 week using (currentTimestamp- 60*60*24*7)
2) Apply loop to result to find name and check state.
Can anybody help to get above output??
I suppose the query should be like this
db.yourcollection.aggregate([{$unwind: "$vms"}, //unwind for convenience
{$match: {"vms.state": {$eq: "off"}, runtime: {$lt: Date.now() - 7*24*60*60*1000}}}, //filter
{$project: {name: "$vms.name", state: "$vms.state"}}]) //projection
UPDATE
This is corrected query to get only docs that didn't have "on" status for a week. It is a bit more difficult, see comments
db.yourcollection.aggregate([{$unwind: "$vms"}, //unwind for convenience
{$match: {runtime: {$lt: Date.now() - 7*24*60*60*1000}}}, //filter for a period
{$project: {_id: "$vms.name", state: "$vms.state"}}, //projection so docs will be like {_id: "a", state: "on"}
{$group: {_id: "$_id", states: {$push: "$state"}}}, //group by id to see all states in array
{$match: {states: {$eq: "off", $ne: "on"}}}, //take only docs which have state "off" and not have state "on"
{$project: {_id: "$_id", state: {$literal: "off"}}}]) //and convert to required output
To understand this query it is a good idea to add one by one pipe to aggregate function and check the result.
Hope this helps.