How can I weight each MongoDB search query differently? - mongodb

I have a MongoDB collection that looks something like this:
{
"name": "McAllister's Deli",
"menu": [
{"sandwich": 4},
{"spud": 3},
{"salad": 5},
{"cookie":2}
],
"reviews": 45
}
I would like to rank these restaurants based on the types of food they have and the number of reviews. For instance, if someone is looking for cookie and sandwich, McAllister's Deli would return a ranking of say 19.28 by taking (cookie * sandwich * reviews) / menuItems. Is there a way to optimize my query to take this ranking into account?
Edit: Since it was asked in a comment, I am currently using the Dart driver, but I am familiar with the Mongo shell and can translate a shell query to a query my driver understands.

db.rest.aggregate([{
$project: {
_id: 1,
sandwich: { $arrayElemAt: [ '$menu.sandwich', 0]},
cookie: { $arrayElemAt: [ '$menu.cookie', 0]},
menuItems: {$size: '$menu'}
}
},{
$project: {
_id: 1,
rank: { $divide: [{ $add: ['$sandwich', '$cookie']}, '$menuItems' ]}
}
}
])
probably can be combined. $project is separated for clearance.

Related

Limit number of objects pushed to array in MongoDB aggregation

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection.
I have a collection of students - each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else - 0),
total average (GPA),
max grade
I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.
This is my query:
db.getCollection("students").aggregate([
{"$match": {
"class_number":
{"$in": [49, 50, 16]},
"grades.curr_class.percentile":
{"$exists": true},
"grades.min": {"$gte": 80},
}},
{"$sort": {"grades.curr_class.score": -1}},
{"$group": {"_id": "$class_number",
"studentsWithHighGPA":
{"$push":
{"$cond": [{"$gte": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
}
},
"studentsWithoutHighGPA":
{"$push":
{"$cond": [{"$lt": ["$grades.gpa", 80]},
{"id": "$_id"},
"$$REMOVE"]
},
},
},
},
])
What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.
Thanks in advance!
I've tried using limit in different variations, and slice too, but none seem to work.
Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:
$match only relevant students as suggested by the OP
$set the groupId for the setWindowFields (as it can currently partition by one key only
$setWindowFields to define the rank of each student in their array
$match only students with the wanted rank
$group by class_number as suggested by the OP:
db.collection.aggregate([
{$match: {
class_number: {$in: [49, 50, 16]},
"grades.curr_class.percentile": {$exists: true},
"grades.min": {$gte: 80}
}},
{$set: {
groupId: {$concat: [
{$toString: "$class_number"},
{$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
]}
}},
{$setWindowFields: {
partitionBy: "$groupId",
sortBy: {"grades.curr_class.score": -1},
output: {rank: {$rank: {}}}
}},
{$match: {rank: {$lte: rankLimit}}},
{$group: {
_id: "$class_number",
studentsWithHighGPA: {$push: {
$cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
studentsWithoutHighGPA: {$push: {
$cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
}}
])
See how it works on the playground example
*This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step
Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.
Something like this:
const pipeline = [
{
'$facet': {
'studentsWithHighGPA': [
{ '$match': { 'grade': { '$gte': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
],
'studentsWithoutHighGPA': [
{ '$match': { 'grade': { '$lt': 80 } } },
{ '$sort': { 'grade': -1 } },
{ '$limit': 16 }
]
}
}
];
coll.aggregate(pipeline)
This should end up with one document including two arrays.
studentsWithHighGPA (array)
0 (object)
1 (object)
...
studentsWithoutHighGPA (array)
0 (object)
1 (object)
See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/
I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.
You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.
Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:
"studentsWithHighGPA": {
"$accumulator": {
init: "function(){
return {combined:[]};
}",
accumulate: "function(state, id, score){
if (score >= 80) {
state.combined.push({_id:id, score:score})
};
return {combined:state.combined.slice(0,16)}
}",
accumulateArgs: [ "$_id", "$grades.gpa"],
merge: "function(A,B){
return {combined:
A.combined.concat(B.combined).sort(
function(SA,SB){
return (SB.score - SA.score)
})
}
}",
finalize: "function(s){
return s.combined.slice(0,16).map(function(A){
return {_id:A._id}
})
}",
lang: "js"
}
}
Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

MongoDB - Obtain full document of a group taking into account the minimum value of one property

Good afternoon, I'm starting in MongoDB and I have a doubt with the group aggregation.
From the following set of documents; I need to get the cheapest room of all similar (grouping by identifier room).
{"_id":"874521035","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"1"},"name":"Doble"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NRF","name":"No reembolsable"},"price":{"cost":{"$numberInt":"115"},"net":{"$numberInt":"116"},"pvp":{"$numberInt":"126"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"123456789","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"1"},"name":"Doble"},"board":{"id":{"$numberInt":"2"},"name":"Alojamiento y desayuno"},"fare":{"id":"NOR","name":"Reembolsable"},"price":{"cost":{"$numberInt":"120"},"net":{"$numberInt":"121"},"pvp":{"$numberInt":"131"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"987654321","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"2"},"name":"Triple"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NOR","name":"Reembolsable"},"price":{"cost":{"$numberInt":"125"},"net":{"$numberInt":"126"},"pvp":{"$numberInt":"136"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"852963147","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"3"},"name":"Doble uso individual"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"price":{"cost":{"$numberInt":"99"},"net":{"$numberInt":"100"},"pvp":{"$numberInt":"110"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
I've got obtain only the cheapest price, the room identifier and the number of repetitions.
db.consolidation.aggregate ([
{
$group: {
_id: "$ room.id",
"cheapest": {$ min: "$ price.pvp"},
        "qty": {$ sum: 1}
}
}]);
{"_id": 2, "cheapest": 136, "qty": 1}
{"_id": 3, "cheapest": 110, "qty": 1}
{"_id": 1, "cheapest": 126, "qty": 2}
Investigating I have seen that data can be obtained with $first or $last, but the data is not the data I need since it is obtained according to the position of the document.
What I need is to obtain from the set of documents, each document with the cheapest room. This is the result I expect:
{"_id":"874521035","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"1"},"name":"Doble"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NRF","name":"No reembolsable"},"price":{"cost":{"$numberInt":"115"},"net":{"$numberInt":"116"},"pvp":{"$numberInt":"126"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"987654321","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"2"},"name":"Triple"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"fare":{"id":"NOR","name":"Reembolsable"},"price":{"cost":{"$numberInt":"125"},"net":{"$numberInt":"126"},"pvp":{"$numberInt":"136"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
{"_id":"852963147","provider":{"id":{"$numberInt":"2"},"name":"HotelBeds"},"accommodation":{"id":{"$numberInt":"36880"},"name":"Hotel Goya"},"room":{"id":{"$numberInt":"3"},"name":"Doble uso individual"},"board":{"id":{"$numberInt":"1"},"name":"Sólo alojamiento"},"price":{"cost":{"$numberInt":"99"},"net":{"$numberInt":"100"},"pvp":{"$numberInt":"110"}},"fees":{"agency":{"$numberInt":"10"},"cdv":{"$numberInt":"1"}},"cancellation-deadeline":"2019-12-31","payment-deadeline":"2019-12-30"}
I hope I have explained.
Thanks in advance.
Regards.
You can add capture $$ROOT as part of your $group stage and then use $filter to compare a list of your rooms against min value. $replaceRoot will allow you to get original shape:
db.collection.aggregate([
{
$group: {
_id: "$room.id",
"cheapest": {
$min: "$price.pvp"
},
"qty": { $sum: 1 },
docs: { $push: "$$ROOT" }
}
},
{
$replaceRoot: {
newRoot: { $arrayElemAt: [ { $filter: { input: "$docs", cond: { $eq: [ "$$this.price.pvp", "$cheapest" ] } } }, 0 ] }
}
}
])
Mongo Playground

Scala / MongoDB - removing duplicate

I have seen very similar questions with solutions to this problem, but I am unsure how I would incorporate it in to my own query. I'm programming in Scala and using a MongoDB Aggregates "framework".
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
// filter duplicates here ?
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "item", "content")))
)
The query returns duplicate objects which is undesirable. I would like to remove these. How could I go about incorporating Aggregates.group and "$addToSet" to do this? Or any other reasonable solution would be great too.
Note: I have to omit some details about the query, so the store lookup aggregate is not there. However, I want to remove the duplicates later in the query so it hopefully shouldn't matter.
Please let me know if I need to provide more information.
Thanks.
EDIT: 31/ 07/ 2019: 13:47
I have tried the following:
val getItems = Seq (
Aggregates.lookup(Store...)...
Aggregates.lookup(Store.STORE_NAME, "relationship.itemID", "uniqueID", "item"),
Aggregates.unwind("$item"),
Aggregates.group("$item.itemID,
Accumulators.first("ID", "$ID"),
Accumulators.first("itemName", "$itemName"),
Accumulators.addToSet("item", "$item")
Aggregates.unwind("$items"),
Aggregates.lookup(Store.STORE_NAME, "item.content", "ID", "content"),
Aggregates.unwind("$content"),
Aggregates.project(Projections.fields(Projections.include("store", "items", "content")))
)
But my query now returns zero results instead of the duplicate result.
You can use $first to remove the duplicates.
Suppose I have the following data:
[
{"_id": 1,"item": "ABC","sizes": ["S","M","L"]},
{"_id": 2,"item": "EFG","sizes": []},
{"_id": 3, "item": "IJK","sizes": "M" },
{"_id": 4,"item": "LMN"},
{"_id": 5,"item": "XYZ","sizes": null
}
]
Now, let's aggregate it using $first and $unwind and see the difference:
First let's aggregate it using $first
db.collection.aggregate([
{ $sort: {
item: 1
}
},
{ $group: {
_id: "$item",firstSize: {$first: "$sizes"}}}
])
Output
[
{"_id": "XYZ","firstSize": null},
{"_id": "ABC","firstSize": ["S","M","L" ]},
{"_id": "IJK","firstSize": "M"},
{"_id": "EFG","firstSize": []},
{"_id": "LMN","firstSize": null}
]
Now, Let's aggregate it using $unwind
db.collection.aggregate([
{
$unwind: "$sizes"
}
])
Output
[
{"_id": 1,"item": "ABC","sizes": "S"},
{"_id": 1,"item": "ABC","sizes": "M"},
{"_id": 1,"item": "ABC","sizes": "L},
{"_id": 3,"item": "IJK","sizes": "M"}
]
You can see $first removes the duplicates where as $unwind keeps the duplicates.
Using $unwind and $first together.
db.collection.aggregate([
{ $unwind: "$sizes"},
{
$group: {
_id: "$item",firstSize: {$first: "$sizes"}}
}
])
Output
[
{"_id": "IJK", "firstSize": "M"},
{"_id": "ABC","firstSize": "S"}
]
group then addToSet is an effective way to deal with your problem !
it looks like this in mongoshell
db.sales.aggregate(
[
{
$group:
{
_id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } },
itemsSold: { $addToSet: "$item" }
}
}
]
)
in scala you can do it like
Aggregates.group("$groupfield", Accumulators.addToSet("fieldName","$expression"))
if you have multiple field to group
Aggregates.group(new BasicDBObject().append("fieldAname","$fieldA").append("fieldBname","$fieldB")), Accumulators.addToSet("fieldName","expression"))
then unwind

In Mongo DB Getting Whole collection of date even if one is matched inside it [duplicate]

This question already has answers here:
How to filter array in subdocument with MongoDB [duplicate]
(3 answers)
Closed 6 years ago.
In Mongo DB Getting Whole collection of date even if one is matched inside it.
Creating a new Collection with the below data:
db.details.insert({
"_id": 1,
"name": "johnson",
"dates": [
{"date": ISODate("2016-05-01")},
{"date": ISODate("2016-08-01")}
]
})
Fetching Back:
db.details.find().pretty()
Output:
{
"_id": 1,
"name": "Johnson",
"dates": [
{"date": ISODate("2016-05-01T00:00:00Z")},
{"date": ISODate("2016-08-01T00:00:00Z")}
]
}
So here there is a collection called dates inside another collection details.
Now I want to filter the date inside dates using Greater than and want the result showing "2016-08-01".
But when I search like the following:
db.details.find(
{"dates.date": {$gt: ISODate("2016-07-01")}},
{"dates.date": 1, "_id": 0}
).pretty()
Getting the Result as below, Its giving me the entire collection even if one date is matched in it:
{
"dates": [
{"date": ISODate("2016-05-01T00:00:00Z")},
{"date": ISODate("2016-08-01T00:00:00Z")}
]
}
Please help in getting the Expected data, i.e.:
{
"date": ISODate("2016-08-01T00:00:00Z")
}
You can use aggregate framework for this:
db.details.aggregate([
{$unwind: '$dates'},
{$match: {'dates.date': {$gt: ISODate("2016-07-01")}}},
{$project: {_id: 0, 'dates.date': 1}}
]);
Another way (Works only for mongo 3.2):
db.details.aggregate([
{$project: {
_id: 0,
dates: {
$filter: {
input: '$dates',
as: 'item',
cond: {
$gte: ['$$item.date', ISODate('2016-08-01T00:00:00Z')]
}
}
}
}
}]);
To only return the date field:
db.details.aggregate([
{$unwind: '$dates'},
{$match: {'dates.date': {$gt: ISODate('2016-07-01')}}},
{$group: {_id: '$dates.date'}},
{$project: {_id: 0, date: '$_id'}}
]);
Returns:
{
"date" : ISODate("2016-08-01T00:00:00Z")
}

How to find out key-value which matches to given value for a week using mongo?

I have following collection structure -
[{
"runtime":1417510501850,
"vms":[{
"name":"A",
"state":"on",
},{
"name":"B",
"state":"off",
}]
},
{
"runtime":1417510484000,
"vms":[{
"name":"A",
"state":"on",
}, {
"name":"B",
"state":"off",
}]
},{
"runtime":1417510184000,
"vms":[{
"name":"A",
"state":"off",
}, {
"name":"B",
"state":"off",
}]
},{
"runtime":1417509884000,
"vms":[{
"name":"A",
"state":"on",
}, {
"name":"B",
"state":"off",
}]
},{
"runtime":1416905084000,
"vms":[{
"name":"A",
"state":"on",
}, {
"name":"B",
"state":"off",
}]
}
]
The difference between these two documents is 5 minutes which is represented by 'runtime'.
I have many such documents.
I want to find names whose state is off for a week. The only condition is state should be off through out week (should not have single value 'on' for key state).
e.g. In above data, if name 'B' is off from one week (by considering 1417510501850 as current timestamp), then my expected output will be -
{
"name":"B",
"state":"off"
}
Currently I am doing following-
1) find documents with state 'off' which are greater than 1 week using (currentTimestamp- 60*60*24*7)
2) Apply loop to result to find name and check state.
Can anybody help to get above output??
I suppose the query should be like this
db.yourcollection.aggregate([{$unwind: "$vms"}, //unwind for convenience
{$match: {"vms.state": {$eq: "off"}, runtime: {$lt: Date.now() - 7*24*60*60*1000}}}, //filter
{$project: {name: "$vms.name", state: "$vms.state"}}]) //projection
UPDATE
This is corrected query to get only docs that didn't have "on" status for a week. It is a bit more difficult, see comments
db.yourcollection.aggregate([{$unwind: "$vms"}, //unwind for convenience
{$match: {runtime: {$lt: Date.now() - 7*24*60*60*1000}}}, //filter for a period
{$project: {_id: "$vms.name", state: "$vms.state"}}, //projection so docs will be like {_id: "a", state: "on"}
{$group: {_id: "$_id", states: {$push: "$state"}}}, //group by id to see all states in array
{$match: {states: {$eq: "off", $ne: "on"}}}, //take only docs which have state "off" and not have state "on"
{$project: {_id: "$_id", state: {$literal: "off"}}}]) //and convert to required output
To understand this query it is a good idea to add one by one pipe to aggregate function and check the result.
Hope this helps.