Assume that there is 4 users in collections.
> db.users.find().pretty()
{
"_id" : ObjectId("5d369b451b48d91cba76c618"),
"user_id" : 1,
"final_score" : 65,
"max_score" : 15,
"min_score" : 15,
}
{
"_id" : ObjectId("5d369b451b48d91cba76c619"),
"user_id" : 2,
"final_score" : 70,
"max_score" : 15,
"min_score" : 15,
}
{
"_id" : ObjectId("5d369b451b48d91cba76c61a"),
"user_id" : 3,
"final_score" : 60,
"max_score" : 15,
"min_score" : 15,
}
{
"_id" : ObjectId("5d369b451b48d91cba76c61b"),
"user_id" : 4,
"final_score" : 83,
"max_score" : 15,
"min_score" : 15,
}
I want to extract users that meet below conditions.
final_score >= user_id=3's final_score + each document's max_score
final_score <= user_id=3's final_score - each document's min_score
To represent with MySQL, it is very simple.
SELECT * FROM users
WHERE final_score <= 60 + users.max_score AND final_score >= 60 - users.min_score
But I wonder that how can I querying with mongodb?
Thanks.
EDIT
I think it can be execute with this.
So I made query like this.
db.users.find({
'final_score': {
'$lte': '60 + this.max_score',
'$gte': '60 - this.min_score'
}
})
But it return nothing
The difficulty here comes from the fact that you need to run two separate pipelines (one to get the value for user 3 and second one to filter all documents). In Aggregation Framework you can do that using $facet operator which allows you to run multiple pipelines and then keep processing data in subsequent steps. To compare the data you can use $filter and to get original shape as a result you need to transform nested array into separate documents using $unwind and $replaceRoot
db.users.aggregate([
{
$facet: {
user3: [
{ $match: { user_id: 3 } }
],
docs: [
{ $match: {} }
]
}
},
{
$addFields: {
user3: { $arrayElemAt: [ "$user3", 0 ] }
}
},
{
$project: {
docs: {
$filter: {
input: "$docs",
cond: {
$and: [
{ $lte: [ "$$this.final_score", { $add: [ "$user3.final_score", "$$this.max_score" ] } ] },
{ $gte: [ "$$this.final_score", { $subtract: [ "$user3.final_score", "$$this.max_score" ] } ] },
]
}
}
}
}
},
{
$unwind: "$docs"
},
{
$replaceRoot: {
newRoot: "$docs"
}
}
])
Mongo Playground
From your description, I guess you already know the score of user3 is 60.
In this case:
db.collection.aggregate([
{
$addFields: {
match: {
$and: [
{
$gte: [
"$final_score",
{
$subtract: [
60,
"$min_score"
]
}
]
},
{
$lte: [
"$final_score",
{
$add: [
60,
"$max_score"
]
}
]
}
]
}
}
},
{
$match: {
match: true
}
},
{
$project: {
match: 0
}
}
])
mongoplayground
Related
{
_id: ObjectId("5dbdacc28cffef0b94580dbd"),
"comments" : [
{
"_id" : ObjectId("5dbdacc78cffef0b94580dbf"),
"replies" : [
{
"_id" : ObjectId("5dbdacd78cffef0b94580dc0")
},
]
},
]
}
How to count the number of element in comments and sum with number of relies
My approach is do 2 query like this:
1. total elements of replies
db.posts.aggregate([
{$match: {_id:ObjectId("5dbdacc28cffef0b94580dbd")}},
{ $unwind: "$comments",},
{$project:{total:{$size:"$comments.replies"} , _id: 0} }
])
2. count total elements of comments
db.posts.aggregate([
{$match: {_id:ObjectId("5dbdacc28cffef0b94580dbd")}},
{$project:{total:{$size:"$comments.replies"} , _id: 0} }
])
Then sum up both, do we have any better solution to write the query like return the sum of of total element comments + replies
You can use $reduce and $concatArrays to "merge" an inner "array of arrays" into a single list and measure the $size of that. Then simply $add the two results together:
db.posts.aggregate([
{ "$match": { _id:ObjectId("5dbdacc28cffef0b94580dbd") } },
{ "$addFields": {
"totalBoth": {
"$add": [
{ "$size": "$comments" },
{ "$size": {
"$reduce": {
"input": "$comments.replies",
"initialValue": [],
"in": {
"$concatArrays": [ "$$value", "$$this" ]
}
}
}}
]
}
}}
])
Noting that an "array of arrays" is the effect of an expression like $comments.replies, so hence the operation to make these into a single array where you can measure all elements.
Try using the $unwind to flatten the list you get from the $project before using $count.
This is another way of getting the result.
Input documents:
{ "_id" : 1, "array1" : [ { "array2" : [ { id: "This is a test!"}, { id: "test1" } ] }, { "array2" : [ { id: "This is 2222!"}, { id: "test 222" }, { id: "222222" } ] } ] }
{ "_id" : 2, "array1" : [ { "array2" : [ { id: "aaaa" }, { id: "bbbb" } ] } ] }
The query:
db.arrsizes2.aggregate( [
{ $facet: {
array1Sizes: [
{ $project: { array1Size: { $size: "$array1" } } }
],
array2Sizes: [
{ $unwind: "$array1" },
{ $project: { array2Size: { $size: "$array1.array2" } } },
],
} },
{ $project: { result: { $concatArrays: [ "$array1Sizes", "$array2Sizes" ] } } },
{ $unwind: "$result" },
{ $group: { _id: "$result._id", total1: { $sum: "$result.array1Size" }, total2: { $sum: "$result.array2Size" } } },
{ $addFields: { total: { $add: [ "$total1", "$total2" ] } } },
] )
The output:
{ "_id" : 2, "total1" : 1, "total2" : 2, "total" : 3 }
{ "_id" : 1, "total1" : 2, "total2" : 5, "total" : 7 }
Our students' scores in 4 fields are modeled like this:
{
"_id" : xxx,
"student" : "Private Ryan",
"math" : 9,
"history" : 8,
"literature" : 6,
"science" : 10
}
The task is to do count how many good/average/bad performed students there are. Given:
Good : having average >= 8 point
Bad : having average score < 5.
If possible, bucket them would be nice too.
You can use $addFields and $let to define "label" for every student. To apply conditional logic you can take advantage of $switch or double $cond. Then you need to run $group to count them and also you can use $push to get entire documents in final result:
db.collection.aggregate([
{
$addFields: {
label: {
$let: {
vars: {
avg: {
$divide: [ { $sum: [ "$math", "$history", "$literature", "$science" ] }, 4 ]
}
},
in: {
$cond: [
{ $gte: [ "$$avg", 8 ] },
"good",
{ $cond: [ { $lt: [ "$$avg", 5 ] }, "bad", "average" ] }
]
}
}
}
}
},
{
$group: {
_id: "$label",
count: { $sum: 1 },
students: { $push: "$$ROOT" }
}
}
])
Mongo Playground
Am trying to find a way to get the minimum number of orders between
2019-03-17 and 2019-03-19 excluding 2019-03-15 from the results ..
{
"_id" : ObjectId("5c8ffdadde62bf097d54ec47"),
"productId" : "32886845998",
"orders" : [
{
"date" : ISODate("2019-03-15T00:00:00.000+0000"),
"orders" : NumberInt(9)
},
{
"date" : ISODate("2019-03-17T00:00:00.000+0000"),
"orders" : NumberInt(21)
},
{
"date" : ISODate("2019-03-18T00:00:00.000+0000"),
"orders" : NumberInt(20)
},
{
"date" : ISODate("2019-03-19T00:00:00.000+0000"),
"orders" : NumberInt(30)
}
]
}
I tried using $min and $max operator but that didn't help because it iterated through the full array to find maximum & minimum
db.products.aggregate([
{
$project: {
maximum: {
$reduce: {
input: "$orders",
initialValue: 0,
in: {
$max: [
"$$value",
{
$cond: [
{ $gte: [ "$$this.date", ISODate("2019-03-17T00:00:00.000+0000") ] },
"$$this.orders",
0
]
}
]
}
}
}
}
}
])
You can use $filter to apply filtering by orders.date and then you can apply $min and $max on filtered set:
db.col.aggregate([
{
$project: {
filteredOrders: {
$filter: {
input: "$orders",
cond: {
$and: [
{ $gte: [ "$$this.date", ISODate("2019-03-17T00:00:00.000+0000") ] },
{ $lte: [ "$$this.date", ISODate("2019-03-19T00:00:00.000+0000") ] },
]
}
}
}
}
},
{
$project: {
min: { $min: "$filteredOrders.orders" },
max: { $max: "$filteredOrders.orders" },
}
}
])
I am trying to update my mongo database which has following structure.
{
"_id" : ObjectId("5a64d076bfd103df081967ae"),
"values" : [
{
"date" : "2018-01-22",
"Price" : "1289.4075"
},
{
"date" : "2018-01-22",
"Price" : "1289.4075"
},
{
"date" : "2015-05-18",
"Price" : 1289.41
}
],
"Code" : 123456,
"schemeStatus" : "Inactive"
}
I want to compare first 2 array element's date value i.e values[0].date and values[1].date. If both matches then I want to delete values[0] so that there will be only 1 entry with that date.
You can use aggregation framework's pipeline with $out as a last stage to update your collection
db.collection.aggregate([
{
$addFields: {
sameDate: {
$let: {
vars: {
fst: { $arrayElemAt: [ "$values", 0 ] },
snd: { $arrayElemAt: [ "$values", 1 ] }
},
in: { $cond: { if: { $eq: [ "$$fst.date", "$$snd.date" ] }, then: 1, else: 0 } }
}
}
}
},
{
$project: {
_id: 1,
values : { $cond: { if: { $eq: [ "$sameDate", 0 ] }, then: "$values", else: { $slice: [ "$values", 1, { $size: "$values" } ] } } },
Code: 1,
schemeStatus: 1
}
},
{ $out: "collection" }
])
Some more important operators used here:
$cond to handle if-else logic
$let to define some helper variables
$arrayElemAt to get first and second element
$slice to pop first element
Sample Documents:
{ time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
{ time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
{ time: ISODate("2013-10-11T19:12:66Z"), value: 3 }
{ time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
{ time: ISODate("2013-10-12T04:15:38Z"), value: 5 }
It's easy to get the aggregated results that is grouped by date.
But what I want is to query results that returns a running total
of the aggregation, like:
{ time: "2013-10-10" total: 3, runningTotal: 3 }
{ time: "2013-10-11" total: 7, runningTotal: 10 }
{ time: "2013-10-12" total: 5, runningTotal: 15 }
Is this possible with the MongoDB Aggregation?
EDIT: Since MongoDB v5.0 the prefered approach would be to use the new $setWindowFields aggregation stage as shared by Xavier Guihot.
This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.
db.temp.aggregate(
[
{
'$group': {
'_id': '$time',
'total': { '$sum': '$value' }
}
},
{
'$sort': {
'_id': 1
}
},
{
'$group': {
'_id': 0,
'time': { '$push': '$_id' },
'totals': { '$push': '$total' }
}
},
{
'$unwind': {
'path' : '$time',
'includeArrayIndex' : 'index'
}
},
{
'$project': {
'_id': 0,
'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } },
'total': { '$arrayElemAt': [ '$totals', '$index' ] },
'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
}
},
]
);
I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.
here is another approach
pipeline
db.col.aggregate([
{$group : {
_id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}},
value : {$sum : "$value"}
}},
{$addFields : {_id : "$_id.time"}},
{$sort : {_id : 1}},
{$group : {_id : null, data : {$push : "$$ROOT"}}},
{$addFields : {data : {
$reduce : {
input : "$data",
initialValue : {total : 0, d : []},
in : {
total : {$sum : ["$$this.value", "$$value.total"]},
d : {$concatArrays : [
"$$value.d",
[{
_id : "$$this._id",
value : "$$this.value",
runningTotal : {$sum : ["$$value.total", "$$this.value"]}
}]
]}
}
}
}}},
{$unwind : "$data.d"},
{$replaceRoot : {newRoot : "$data.d"}}
]).pretty()
collection
> db.col.find()
{ "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 }
{ "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 }
{ "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 }
{ "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 }
{ "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 }
result
{ "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 }
>
Here is a solution without pushing previous documents into a new array and then processing them. (If the array gets too big then you can exceed the maximum BSON document size limit, the 16MB.)
Calculating running totals is as simple as:
db.collection1.aggregate(
[
{
$lookup: {
from: 'collection1',
let: { date_to: '$time' },
pipeline: [
{
$match: {
$expr: {
$lt: [ '$time', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
What we did: within the lookup we selected all documents with smaller datetime and immediately calculated the sum (using $group as the second step of lookup's pipeline). The $lookup put the value into the first element of an array. We pull the first array element and then calculate the sum: current value + sum of previous values.
If you would like to group transactions into days and after it calculate running totals then we need to insert $group to the beginning and also insert it into $lookup's pipeline.
db.collection1.aggregate(
[
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$lookup: {
from: 'collection1',
let: { date_to: '$_id' },
pipeline: [
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$match: {
$expr: {
$lt: [ '$_id', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
The result is:
{ "_id" : "2013-10-10", "value" : 3, "running_total" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "running_total" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "running_total" : 15 }
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
// { time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
// { time: ISODate("2013-10-11T12:12:66Z"), value: 3 }
// { time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
// { time: ISODate("2013-10-12T05:15:38Z"), value: 5 }
db.collection.aggregate([
{ $group: {
_id: { $dateToString: { format: "%Y-%m-%d", date: "$time" } },
total: { $sum: "$value" }
}},
// e.g.: { "_id" : "2013-10-11", "total" : 7 }
{ $set: { "date": "$_id" } }, { $unset: ["_id"] },
// e.g.: { "date" : "2013-10-11", "total" : 7 }
{ $setWindowFields: {
sortBy: { date: 1 },
output: {
running: {
$sum: "$total",
window: { documents: [ "unbounded", "current" ] }
}
}
}}
])
// { date: "2013-10-11", total: 7, running: 7 }
// { date: "2013-10-10", total: 3, running: 10 }
// { date: "2013-10-12", total: 5, running: 15 }
Let's focus on the $setWindowFields stage that:
chronologically $sorts grouped documents by date: sortBy: { date: 1 }
adds the running field in each document (output: { running: { ... }})
which is the $sum of totals ($sum: "$total")
on a specified span of documents (the window)
which is in our case any previous document: window: { documents: [ "unbounded", "current" ] } }
as defined by [ "unbounded", "current" ] meaning the window is all documents seen between the first document (unbounded) and the current document (current).