MongoDB aggregation, counting each item in array and grouping by item - mongodb

here I have an array of duplicate items like this
[
'gg',
'bb',
'dd',
'cc',
'll',
'aa',
'cc',
'gg',
'bb',
'dd',
'cc',
'bb',
'dd',
'll',
'aa',
]
and what I'm willing to return is like this
{
'gg': 2,
'bb': 3,
'dd': 3,
'cc': 2,
'll': 2,
'aa': 2,
}
Can it be done with MongoDB aggregation ??? Appreciate any help

Use $unwind and $group as stages of aggregation pipiline:
Query:
db.collection.aggregate([
{
$unwind: "$items"
},
{
$group: {
_id: "$items",
count: {
$sum: 1
}
}
}
])
Result:
{
"_id": "ll",
"count": 2
},
{
"_id": "gg",
"count": 2
},
{
"_id": "bb",
"count": 3
},
{
"_id": "cc",
"count": 3
},
{
"_id": "aa",
"count": 2
},
{
"_id": "dd",
"count": 3
}

This also works really well...
db.users.aggregate([
{
$group: {
_id: "$email",
count: { $sum: 1 }
}
},
{
$match: {
count: { $gt: 1 }
}
}
])
Output:
{ "_id" : "a#gmail.com", "count" : 2 }
{ "_id" : "b#gmail.com", "count" : 2 }
{ "_id" : "c#gmaiL.com", "count" : 8 }
{ "_id" : "d#gmail.com", "count" : 2 }
{ "_id" : "e#gmail.com", "count" : 2 }
{ "_id" : "f#gmail.com", "count" : 2 }

Related

How to write one query (count distinct, sum) in MongoDB?

Query: select count(distinct finish_date), sum(study_num) from table where student_id=1234
Documents:
{
"_id" : ObjectId("602252684a43d5b364f3e6ca"),
"student_id" : 1234,
"study_num" : 8,
"finish_date" : "20210209",
},
{
"_id" : ObjectId("602257594a43d5b364f4cc6a"),
"student_id" : 1234,
"study_num" : 7,
"finish_date" : "20210207",
},
{
"_id" : ObjectId("5fbb65580d685b17fa56e18f"),
"student_id" : 2247,
"study_num" : 6,
"finish_date" : "20210209",
}
You can use $match and $group
db.collection.aggregate([
{
"$match": {"student_id": 1234}
},
{
"$group": {
"_id": "$finish_date",
"study_sum": { $sum: "$study_num" }
}
},
{
"$group": {
"_id": null,
"study_sum": { $sum: "$study_sum" },
count: { $sum: 1 }
}
}
])
Working Mongo playground
Query: select count(distinct finish_date), sum(study_num) from table
where student_id=1234
How to write the query? Write using an aggregation:
db.collection.aggregate([
{
$match: { student_id: 1234 }
},
{
$group: {
_id: "",
distinct_dates: { $addToSet: "$finish_date" },
study_sum: { $sum: "$study_num" }
}
},
{
$project: {
count: { $size: "$distinct_dates" },
study_sum: 1, _id: 0
}
}
])
The output: { "study_sum" : 15, "count" : 2 }
Reference: SQL to Aggregation Mapping Chart

mongodb: match, group by multiple fields, project and count

So I'm learning mongodb and I got a collection of writers to train.
Here I'm trying to count works by sorting them by country and gender of the author. This is what I accoplished so far:
db.writers.aggregate([
{ "$match": { "gender": {"$ne": male}}},
{ "$group": {
"_id": {
"country_id": "$country_id",
"type": "$type"
},
}},
{ "$group": {
"_id": "$_id.country_id",
"literary_work": {
"$push": {
"type": "$_id.type",
"count": { "$sum": "$type" }
}
},
"total": { "$sum": "$type" }
}},
{ "$sort": { "country_id": 1 } },
{ "$project": {
"literary_work": { "$slice": [ "$literary_work", 3 ] },
"total": { "$sum": "$type" }
}}
])
Sadly, the output that I get is not the one I'm expecting:
"_id" : GREAT BRITAIN,
"literary_work" : [
{
"type" : "POEM",
"count" : 0
},
{
"type" : "NOVEL",
"count" : 0
},
{
"type" : "SHORT STORY",
"count" : 0
}
],
"total" : 0
Could anyone tell me where do I insert the count stage or what is my mistake?)
upd:
Data sample:
{
"_id" : ObjectId("5f115c5d5f62f9f482cd7a49"),
"author" : George Sand,
"gender" : female,
"country_id" : FRANCE,
"title": "Consuelo",
"type" : "NOVEL",
}
Expected result (NB! this is a result for both genders):
{
"_id" : FRANCE,
"count" : 59.0,
"literary_work" : [
{
"type" : "POEM",
"count" : 14.0
},
{
"type" : "NOVEL",
"count" : 34.0
},
{
"type" : "SHORT STORY",
"count" : 11.0
}
]
}
Your implementation is correct way but there are missing things:
missed count in first $group
on the base of first group count it can count whole count of literary_work
and $project is not needed from your query
Corrected things in query,
db.writers.aggregate([
{
$match: {
gender: { $ne: "male" }
}
},
{
$group: {
_id: {
country_id: "$country_id",
type: "$type"
},
// missed this
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.country_id",
// this count will be on the base of first group count
count: { $sum: "$count" },
literary_work: {
$push: {
type: "$_id.type",
// add count in inner count
count: "$count"
}
}
}
},
// corrected from country_id to _id
{
$sort: { "_id": 1 }
}
])
Working Playground: https://mongoplayground.net/p/JWP7qdDY6cc

Apply multistage grouping in MongoDb Aggregation Framework

lets's assume I have the following data:
[
{ name: "Clint", hairColor: "brown", shoeSize: 8, income: 20000 },
{ name: "Clint", hairColor: "blond", shoeSize: 9, income: 30000 },
{ name: "George", hairColor: "brown", shoeSize: 7, income: 30000 },
{ name: "George", hairColor: "blond", shoeSize: 8, income: 10000 },
{ name: "George", hairColor: "blond", shoeSize: 9, income: 20000 }
]
I want to have the following output:
[
{
name: "Clint",
counts: 2,
avgShoesize: 8.5,
shoeSizeByHairColor: [
{ _id: "brown", counts: 1, avgShoesize: 8 },
{ _id: "blond", counts: 1, avgShoesize: 9 },
],
incomeByHairColor: [
{ _id: "brown", counts: 1, avgIncome: 20000 },
{ _id: "blond", counts: 1, avgIncome: 30000 },
]
},
{
name: "George",
counts: 3,
avgShoesize: 8,
shoeSizeByHairColor: [
{ _id: "brown", counts: 1, avgShoesize: 8 },
{ _id: "blond", counts: 2, avgShoesize: 8.5 },
],
incomeByHairColor: [
{ _id: "brown", counts: 1, avgIncome: 30000 },
{ _id: "blond", counts: 2, avgIncome: 15000 },
],
}
]
Basically I want to group my dataset by some key and then I want to have multiple groups of the subset.
First I thought of applying a $group with the key name. and the to use $facet in order to have various aggregations. I guess this will ot work since $facet does not use the subset from the previous $group. If I use $facet first I would need to split the result in multiple documents.
Any ideas how to properly solve my problem?
You need double $group, first one should aggregate by name and hairColor. And the second one can build nested array:
db.collection.aggregate([
{
$group: {
_id: { name: "$name", hairColor: "$hairColor" },
count: { $sum: 1 },
sumShoeSize: { $sum: "$shoeSize" },
avgShoeSize: { $avg: "$shoeSize" },
avgIncome: { $avg: "$income" },
docs: { $push: "$$ROOT" }
}
},
{
$group: {
_id: "$_id.name",
count: { $sum: "$count" },
sumShoeSize: { $sum: "$sumShoeSize" },
shoeSizeByHairColor: {
$push: {
_id: "$_id.hairColor", counts: "$count", avgShoeSize: "$avgShoeSize"
}
},
incomeByHairColor: {
$push: {
_id: "$_id.hairColor", counts: "$count", avgIncome: "$avgIncome"
}
}
}
},
{
$project: {
_id: 1,
count: 1,
avgShoeSize: { $divide: [ "$sumShoeSize", "$count" ] },
shoeSizeByHairColor: 1,
incomeByHairColor: 1
}
}
])
Mongo Playground
Phase 1: You can group by name and hairColor
and accumulate count, avgShoeSize, avgIncome, hairColors
Phase 2: Push accumulated into an array of incomeByHairColor, incomeByHairColor using $map operator.
Phase 3: Finally, in phase 3 you accumulate group by name and accumulate,
incomeByHairColor, incomeByHairColor and count
Pipeline:
db.users.aggregate([
{
$group :{
_id: {
name : "$name",
hairColor: "$hairColor"
},
count : {"$sum": 1},
avgShoeSize: {$avg: "$shoeSize"},
avgIncome : {$avg: "$income"},
hairColors : {$addToSet:"$hairColor" }
}
},
{
$project: {
_id:0,
name : "$_id.name",
hairColor: "$_id.hairColor",
count : "$count",
incomeByHairColor : {
$map: {
input: "$hairColors",
as: "key",
in: {
_id: "$$key",
counts: "$count",
avgIncome: "$avgIncome"
}
}
},
shoeSizeByHairColor:{
$map: {
input: "$hairColors",
as: "key",
in: {
_id: "$$key",
counts: "$count",
avgShoeSize: "$avgShoeSize"
}
}
}
}
},
{
$group: {
_id : "$name",
count : {$sum: "$count"},
incomeByHairColor: {$push : "$incomeByHairColor"},
shoeSizeByHairColor : {$push : "$shoeSizeByHairColor"}
}
}
]
)
Output:
/* 1 */
{
"_id" : "Clint",
"count" : 2,
"incomeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 1,
"avgIncome" : 30000
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgIncome" : 20000
}
]
],
"shoeSizeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 1,
"avgShoeSize" : 9
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgShoeSize" : 8
}
]
]
},
/* 2 */
{
"_id" : "George",
"count" : 3,
"incomeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 2,
"avgIncome" : 15000
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgIncome" : 30000
}
]
],
"shoeSizeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 2,
"avgShoeSize" : 8.5
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgShoeSize" : 7
}
]
]
}

Multilevel $group using mongodb

I am trying to get the count of all the different value of a key in my MongoDB. I am getting the count as well but i am getting it with 2 different objects.
{ "_id" : ObjectId("596f6e95b6a1aa8d363befeb"), produce:"potato","variety" : "abc", "state" : 'PA' }
{ "_id" : ObjectId("596f6e95b6a1aa8d363befec"), produce:"potato", "variety" : "abc", "state" : 'PA' }
{ "_id" : ObjectId("596f6e95b6a1aa8d363befed"), produce:"potato", "variety" : "def", "state" : 'IA' }
{ "_id" : ObjectId("596f6e95b6a1aa8d363befee"), produce:"potato", "variety" : "def", "state" : 'IA' }
{ "_id" : ObjectId("596f6e95b6a1aa8d363befef"), produce:"potato", "variety" : "abc", "state" : 'DA' }
{ "_id" : ObjectId("596f6e95b6a1aa8d363befeg"), produce:"potato", "variety" : "abc", "state" : 'DA' }
{ "_id" : ObjectId("596f6e95b6a1aa8d363befeh"), produce:"potato", "variety" : "def", "state" : 'DA' }
{ "_id" : ObjectId("596f6e95b6a1aa8d363befei"), produce:"potato", "variety" : "abc", "state" : 'IA' }
db.aggregate([
{
$match:{produce: "potato"}
},
{
"$group":{
"_id":{"variety":"$variety","state":"$state"},
"count":{"$sum":1}
}
},
{
"$group":{
"_id":null,
"counts":{
"$push": {"filterkey":"$_id.variety","state":"$_id.state","count":"$count"}
}
}
},
])
Actual Result : -
counts
[
{ filterkey: 'abc', state: 'PA', count: 2},
{ filterkey: 'abc', state: 'IA', count: 1},
{ filterkey: 'abc', state: 'DA', count: 2},
{ filterkey: 'def', state: 'IA', count: 2},
{ filterkey: 'def', state: 'DA', count: 1}
]
Expected Result : -
counts
[
{ filterkey: 'abc', states:{'PA':2,'IA':1,'DA':2},
{ filterkey: 'def', states:{'IA':2,'DA':1}
]
Is there is some way to get the data like this?
You need to use multilevel $group ing here. First you need to use $group with the variety and state fields and need to $sum to get total number of unique document per variety and state.
Then second you need to use $group with the variety to get the number of unique documents per variety.
And Finally $arrayToObject to flatten the states array.
db.collection.aggregate([
{ "$match": { "produce": "potato" }},
{ "$group": {
"_id": { "variety": "$variety", "state": "$state" },
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.variety",
"states": {
"$push": {
"k": "$_id.state",
"v": "$count"
}
}
}},
{ "$addFields": {
"states": {
"$arrayToObject": "$states"
}
}}
])
You can remove stages one by one here and can see what actually happens.
Output
[
{
"_id": "def",
"states": {
"DA": 1,
"IA": 2
}
},
{
"_id": "abc",
"states": {
"DA": 2,
"IA": 1,
"PA": 2
}
}
]

Nested output from mongo aggregate query

This is right out of the mongo aggregation documentation. Lets say I have these set of documents:
{ _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 }
{ _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 }
{ _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 }
{ _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 }
{ _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 }
I can run this aggregate query:
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
])
To get this response:
{ "_id" : "xyz1", "total" : 100 }
{ "_id" : "abc1", "total" : 75 }
But what if a want the response in a nested format? Any way of achieving that without using mapReduce? Something like this:
{ "_id" : "xyz1", "amount": { "total" : 100 } }
{ "_id" : "abc1", "amount": { "total" : 75 } }
You need to project your documents using the $project operator
db.collection.aggregate([
{ "$group": {
"_id": "$cust_id",
"total": { "$sum": "$amount" }
}},
{ "$project": { "amount.total": "$total" } },
{ "$sort": { "amount.total": -1 } }
])
Which returns:
{ "_id" : "xyz1", "amount" : { "total" : 250 } }
{ "_id" : "abc1", "amount" : { "total" : 75 } }