Consider having these objects in db.invoices
{ "customer" : "john", "price" : 4, "weekday": "WED" }
{ "customer" : "john", "price" : 8, "weekday": "SUN" }
{ "customer" : "john", "price" : 6, "weekday": "SAT" }
{ "customer" : "john", "price" : 5, "weekday": "SUN" }
{ "customer" : "bob", "price" : 10, "weekday": "SAT" }
{ "customer" : "bob", "price" : 15, "weekday": "MON" }
How can I query for documents having the maximum price for each customer? For above sample:
[ {
"customer": "bob",
"price": 15,
"weekday": "MON"
}, {
"customer": "john",
"price": 8,
"weekday": "SUN"
} ]
I can't figure it out using aggregation framework.
Edit 1: Problem is getting weekdays along with customer names. I do not want the maximum price alone.
Because you want to include weekday you need to pre-sort the docs to put the doc you want from each group first and then use $group with $first:
db.invoices.aggregate([
{$sort: {customer: 1, price: -1}},
{$group: {
_id: '$customer',
price: {$first: '$price'},
weekday: {$first: '$weekday'}
}}
])
Here's a way to get the result you want, it's one of several:
db.invoices.aggregate([
{$project: {customer: 1, other:{ price: "$price", weekday: "$weekday"}}},
{$group: {
_id: '$customer',
max: {$max: '$other'}
}
])
You can use $group operator:
db.invoises.aggregate([
{ $group : { _id: '$customer', price: { $max: '$price' } } }
])
Related
I've a collection in MongoDB of objects with this structure:
{
"_id": "ID",
"email": "EMAIL",
"name": "Foo",
"surname": "Bar",
"orders": [
{
"createdAt": "2019-09-09T07:30:25.575Z"
},
{
"createdAt": "2019-10-30T14:20:04.849Z"
},
{
"createdAt": "2019-10-30T16:38:27.271Z"
},
{
"createdAt": "2020-01-03T15:49:39.614Z"
},
],
}
I need to count all duplicates "createdAt" and distinct it with changing date format.
The result should be like below:
{
"_id": "ID",
"email": "EMAIL",
"name": "Foo",
"surname": "Bar",
"orders": [
{
"date": "2019-09-09",
"total": 1,
},
{
"date": "2019-10-30",
"total": 2,
},
{
"date": "2020-01-03",
"total": 1,
},
],
}
I tried with $unwind orders.createdAt in db.collection.aggregate() but i've no idea how can i get this result.
Thanks in advance.
Try this on for size. Given this data:
db.foo.insert([
{
"_id": "ID",
"email": "EMAIL", "name": "Foo", "surname": "Bar",
"orders": [
{ "createdAt": new Date("2019-09-09T07:30:25.575Z") },
{ "createdAt": new Date("2019-10-30T14:20:04.849Z") },
{ "createdAt": new Date("2019-10-30T16:38:27.271Z") },
{ "createdAt": new Date("2020-01-03T15:49:39.614Z") }
]
},
{
"_id": "ID2",
"email": "EMAIL2", "name": "Bin", "surname": "Baz",
"orders": [
{ "createdAt": new Date("2019-09-09T07:30:25.575Z") },
{ "createdAt": new Date("2020-10-30T14:20:04.849Z") },
{ "createdAt": new Date("2020-10-30T16:38:27.271Z") },
{ "createdAt": new Date("2020-10-30T15:49:39.614Z") }
]
}
]);
This agg:
db.foo.aggregate([
{$unwind: "$orders"}
// First $group is on just the Y-M-D part of the date plus the id.
// This will produce the basic info the OP seeks -- but not in the desired
// data structure:
,{$group: {
_id: {orig_id: "$_id", d: {$dateToString: {date: "$orders.createdAt", format: "%Y-%m-%d"}} },
n:{$sum:1} ,
email: {$first: "$email"},
name: {$first: "$name"},
surname: {$first: "$surname"}
}}
// The group is not guaranteed to preserve the order of the dates. So now that
// the basic agg is done, reorder by DATE. _id.d is a Y-M-D string but fortunately
// that sorts correctly for our purposes:
,{$sort: {"_id.d":1}}
// ...so in the second $group, we pluck just the id from the id+YMD_date key and
// take the YMD_date+n and *push* it onto a new orders array to arrive at the
// desired data structure. We are not guaranteed the order of orig_id (e.g.
// ID or ID2) but for each id, the push *will* happen in the order of arrival -- which was
// sorted correctly in the prior stage! As an experiment, try changing the
// sort to -1 (reverse) and see what happens.
,{$group: {_id: "$_id.orig_id",
email: {$first: "$email"},
name: {$first: "$name"},
surname: {$first: "$surname"},
orders: {$push: {date: "$_id.d", total: "$n"}} }}
]);
yields this output:
{
"_id" : "ID",
"email" : "EMAIL",
"name" : "Foo",
"surname" : "Bar",
"orders" : [
{
"date" : "2019-09-09",
"total" : 1
},
{
"date" : "2019-10-30",
"total" : 2
},
{
"date" : "2020-01-03",
"total" : 1
}
]
}
{
"_id" : "ID2",
"email" : "EMAIL2",
"name" : "Bin",
"surname" : "Baz",
"orders" : [
{
"date" : "2019-09-09",
"total" : 1
},
{
"date" : "2020-10-30",
"total" : 3
}
]
}
If you are willing to have a slightly more complex return structure and some dupe data in return for greater dynamic behavior by not having to enumerate each field (e.g. field: {$first: "$field"} then you can do this:
db.foo.aggregate([
{$unwind: "$orders"}
,{$group: {
_id: {orig_id: "$_id", d: {$dateToString: {date: "$orders.createdAt", format: "%Y-%m-%d"}} },
n:{$sum:1} ,
ALL: {$first: "$$CURRENT"}
}}
,{$group: {_id: "$_id.orig_id",
ALL: {$first: "$ALL"},
orders: {$push: {date: "$_id.d", total: "$n"}} }}
]);
to yield this:
{
"_id" : "ID2",
"ALL" : {
"_id" : "ID2",
"email" : "EMAIL2",
"name" : "Bin",
"surname" : "Baz",
"orders" : {
"createdAt" : ISODate("2019-09-09T07:30:25.575Z")
}
},
"orders" : [
{
"date" : "2019-09-09",
"total" : 1
},
{
"date" : "2020-10-30",
"total" : 3
}
]
}
{
"_id" : "ID",
"ALL" : {
"_id" : "ID",
"email" : "EMAIL",
"name" : "Foo",
"surname" : "Bar",
"orders" : {
"createdAt" : ISODate("2019-10-30T14:20:04.849Z")
}
},
"orders" : [
{
"date" : "2019-10-30",
"total" : 2
},
{
"date" : "2020-01-03",
"total" : 1
},
{
"date" : "2019-09-09",
"total" : 1
}
]
}
Given the following Data:
> db.users.find({}, {name: 1, createdAt: 1, updatedAt: 1}).limit(5).pretty()
{
"_id" : ObjectId("5ec8f74f32973c7b7cb7cce9"),
"createdAt" : ISODate("2020-05-23T10:13:35.012Z"),
"updatedAt" : ISODate("2020-08-20T13:37:09.861Z"),
"name" : "Patrick Jere"
}
{
"_id" : ObjectId("5ec8ef8a2b6e5f78fa20443c"),
"createdAt" : ISODate("2020-05-23T09:40:26.089Z"),
"updatedAt" : ISODate("2020-07-23T07:54:01.833Z"),
"name" : "Austine Wiga"
}
{
"_id" : ObjectId("5ed5e1a3962a3960ad85a1a2"),
"createdAt" : ISODate("2020-06-02T05:20:35.090Z"),
"updatedAt" : ISODate("2020-07-29T14:02:52.295Z"),
"name" : "Biasi Phiri"
}
{
"_id" : ObjectId("5ed629ec6d87382c608645d9"),
"createdAt" : ISODate("2020-06-02T10:29:00.204Z"),
"updatedAt" : ISODate("2020-06-02T10:29:00.204Z"),
"name" : "Chisambwe Kalusa"
}
{
"_id" : ObjectId("5ed8d21f42bc8115f67465a8"),
"createdAt" : ISODate("2020-06-04T10:51:11.546Z"),
"updatedAt" : ISODate("2020-06-04T10:51:11.546Z"),
"name" : "Wakun Moyo"
}
...
Sample Data
I use the following query to return new_users by months:
db.users.aggregate([
{
$group: {
_id: {$dateToString: {format: '%Y-%m', date: '$createdAt'}},
new_users: {
$sum: {$ifNull: [1, 0]}
}
}
}
])
example result:
[
{
"_id": "2020-06",
"new_users": 125
},
{
"_id": "2020-07",
"new_users": 147
},
{
"_id": "2020-08",
"new_users": 43
},
{
"_id": "2020-05",
"new_users": 4
}
]
and this query returns new_users, active_users and total users for a specific month.
db.users.aggregate([
{
$group: {
_id: null,
new_users: {
$sum: {
$cond: [{
$gte: ['$createdAt', ISODate('2020-08-01')]
}, 1, 0]
}
},
active_users: {
$sum: {
$cond: [{
$gt: ['$updatedAt', ISODate('2020-02-01')]
}, 1, 0]
}
},
total_users: {
$sum: {$ifNull: [1, 0]}
}
}
}
])
How can I get the second query to return results by months just like in the first query?
expected results based on one month filter:
[
{ _id: '2020-09', new_users: 0, active_users: 69},
{ _id: '2020-08', new_users: 43, active_users: 219},
{ _id: '2020-07', new_users: 147, active_users: 276},
{ _id: '2020-06', new_users: 125, active_users: 129},
{ _id: '2020-05', new_users: 4, active_users: 4}
]
You can try below aggregation.
Count new users followed by look up to count the active users for the time window for each year month.
db.users.aggregate([
{"$group":{
"_id":{"$dateFromParts":{"year":{"$year":"$createdAt"},"month":{"$month":"$createdAt"}}},
"new_users":{"$sum":1}
}},
{"$lookup":{
"from":"users",
"let":{"end_date":"$_id", "start_date":{"$dateFromParts":{"year":{"$year":"$_id"},"month":{"$subtract":[{"$month":"$_id"},1]}}}},
"pipeline":[
{"$match":{"$expr":
{"$and":[{"$gte":[
"$updatedAt",
"$$start_date"
]}, {"$lt":[
"$updatedAt",
"$$end_date"
]}]}
}},
{"$count":"activeUserCount"}
],
"as":"activeUsers"
}},
{"$project":{
"year-month":{"$dateToString":{"format":"%Y-%m","date":"$_id"}},
"new_users":1,
"active_users":{"$arrayElemAt":["$activeUsers.activeUserCount", 0]},
"_id":0
}}])
You can do the same, that you did in first query, group by cteatedAt, no need to use $ifNull operator in total_users,
Playground
Updated,
use $facet group by month and count for both counts
$project to concat both arrays using $concatArrays
$unwind deconstruct array root
$group by month and merge both month and count
Playground
I have collection of products and these products have assessments. I need select product with the highest average of assessment. The problem is I can group products by average but I cannot group by average and select product with highest average.
To reproduce my problem follow these steps:
Insert products:
db.products.insert([
{
name: "Product1",
price: 1000,
features: {
feature1: 0.8,
feature2: 23
},
tags: ["tag1", "tag2", "tag3", "tag4"],
assessments: [
{name: "John", assessment: 3},
{name: "Anna", assessment: 4},
{name: "Kyle", assessment: 3.6}
]
},
{
name: "Product2",
price: 1200,
features: {
feature1: 4,
feature2: 4000,
feature3: "SDS"
},
tags: ["tag1"],
assessments: [
{name: "John", assessment: 5},
{name: "Richard", assessment: 4.8}
]
},
{
name: "Product3",
price: 450,
features: {
feature1: 1.3,
feature2: 60
},
tags: ["tag1", "tag2"],
assessments: [
{name: "Anna", assessment: 5},
{name: "Robert", assessment: 4},
{name: "John", assessment: 4},
{name: "Julia", assessment: 3}
]
},
{
name: "Product4",
price: 900,
features: {
feature1: 1700,
feature2: 17
},
tags: ["tag1", "tag2", "tag3"],
assessments: [
{name: "Monica", assessment: 3},
{name: "Carl", assessment: 4}
]
}
])
And I want to group by avg of assessments and select product with max avg.
I do it following:
db.products.aggregate([
{ $unwind : "$assessments" },
{ $group:
{
_id: "$name",
avg_assessment: {$avg: "$assessments.assessment"}
}
},
{ $project:
{
_id: 0,
product: "$_id",
avg_assessment: 1
}
}
])
Result of this query is:
{ "avg_assessment" : 3.5, "product" : "Product4" }
{ "avg_assessment" : 4, "product" : "Product3" }
{ "avg_assessment" : 4.9, "product" : "Product2" }
{ "avg_assessment" : 3.533333333333333, "product" : "Product1" }
Nice. Then I try to select product with highest avg using following query:
db.products.aggregate([
{ $unwind : "$assessments" },
{ $group:
{
_id: "$name",
avg_assessment: { $max: {$avg: "$assessments.assessment"}}
}
},
{ $project:
{
_id: 0,
product: "$_id",
avg_assessment: 1
}
}
])
But the result is the same but with rounded up values:
{ "avg_assessment" : 4, "product" : "Product4" }
{ "avg_assessment" : 5, "product" : "Product3" }
{ "avg_assessment" : 5, "product" : "Product2" }
{ "avg_assessment" : 4, "product" : "Product1" }
What's going on? Where is a problem?
You can try below aggregation. No $unwind needed here.
Compute $avg for each assessment followed by sort desc.
$group with $first to pick the assessment with highest avg value.
Add $project stage to limit the fields.
db.products.aggregate([
{ "$addFields" : {"avg_assessment":{"$avg":"$assessments.assessment" }}},
{ "$sort":{"avg_assessment":-1}},
{ "$group":
{
"_id": null,
"highest_avg_assessment": { $first:"$$ROOT"}
}
}
])
This might help:
db.products.aggregate([
{ $unwind : "$assessments" },
{ $group:
{
_id: "$name",
avg_assessment: {$avg: "$assessments.assessment"}
}
},
{
$sort: { avg_assessment: -1 } // sort by avg_assessment descending
},
{
$limit: 1 // only return one document
}
])
Given this data:
[ {country:"US", city:"NY", cnt: 10}
{country:"IT", city:"MI", cnt: 9}
{country:"US", city:"LA", cnt: 8}
{country:"IT", city:"RM", cnt: 20} ]
Is there a way using the mongoDB aggregation pipeline to create a result array which looks like this (not based on alpha codes, just the cnt):
[ {country:"IT", city:"RM", cnt:20}
{country:"IT", city:"MI", cnt: 9}
{country:"US", city:"NY", cnt:10}
{country:"US", city:"LA", cnt: 8} ]
}
In other words, an array sorted (descending) by the country with the highest total and then by each city's highest total?
I can group by country or group by country and city, but neither will give me the above result. One gives me two lines with totals for each country, the other gives me four lines with country city totals, but not sorted by the country with the highest totals.
Just add $sort after the $group:
{ "$sort": { "country": 1, "cnt": -1 } }
Results in:
{ "country" : "IT", "city" : "RM", "cnt" : 20 }
{ "country" : "IT", "city" : "MI", "cnt" : 9 }
{ "country" : "US", "city" : "NY", "cnt" : 10 }
{ "country" : "US", "city" : "LA", "cnt" : 8 }
To be use totals then group to get the total count:
{ "$group": {
"_id": "$country",
"cities": { "$push": {
"city": "$city",
"cnt": "$cnt"
}},
"totalCount": { "$sum": "$cnt" }
}},
{ "$unwind": "$cities" },
{ "$sort": { "totalCount": -1, "_id": 1, "cities.cnt": -1 }},
{ "$project": {
"_id": 0,
"country": "$_id",
"city": "$cities.city",
"cnt": "$cities.cnt"
}}
Project out to get the same result
Lets say I have 2 reports documents with an embeded line_items document:
Reports with embeded line_items
{
_id: "1",
week_number: "1",
line_items: [
{
cash: "5",
miscellaneous: "10"
},
{
cash: "20",
miscellaneous: "0"
}
]
},
{
_id: "2",
week_number: "2",
line_items: [
{
cash: "100",
miscellaneous: "0"
},
{
cash: "10",
miscellaneous: "0"
}
]
}
What I need to do is perform a set of additions on each line_item (in this case cash + miscellaneous) and have the grand total set on the reports query as a 'gross' field. I would like to end up with the following result:
Desired result
{ _id: "1", week_number: "1", gross: "35" },{ _id: "2", week_number: "2", gross: "110" }
I have tried the following query to no avail:
db.reports.aggregate([{$unwind: "$line_items"},{$group: {_id : "$_id", gross: {$sum : {$add: ["$cash", "$miscellaneous"]}}}}]);
You can't sum strings, so you'll first need to change the data type of the cash and miscellaneous fields in your docs to a numeric type.
But once you do that, you can sum them by including the line_items. prefix on those fields in your aggregate command:
db.reports.aggregate([
{$unwind: "$line_items"},
{$group: {
_id : "$_id",
gross: {$sum : {$add: ["$line_items.cash", "$line_items.miscellaneous"]}}
}}
]);
Output:
{
"result" : [
{
"_id" : "2",
"gross" : 110
},
{
"_id" : "1",
"gross" : 35
}
],
"ok" : 1
}