MongoDB group by elements in array - mongodb

I have a collection that looks like this:
{
"id": "id1",
"tags": ['a', 'b']
},
{
"id": "id2",
"tags": ['b', 'c']
},
{
"id": "id3",
"tags": ['a', 'c']
}
How can I make a query that groups by every element in the "tags" array, so the result looks like this?:
{'a': 2},
{'b': 2},
{'c': 2}
(where 2 is the number of times it appears, the count).
Thanks for your help!

You can use this aggregation query:
First $unwind the array to deconstruct an access like objects.
Then $group by tags and $sum 1 to get the total.
And last use $replaceRoot with $arrayToObject to get the desired output.
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$group": {
"_id": "$tags",
"count": {
"$sum": 1
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
"k": "$_id",
"v": "$count"
}
]
]
}
}
}
])
Example here
As an adittion, if you want to get sorted values (a, b, c...) you can add $sort stage like this example

Related

MongoDB Aggregation - Count, lookup [duplicate]

I have a collection that looks like this:
{
"id": "id1",
"tags": ['a', 'b']
},
{
"id": "id2",
"tags": ['b', 'c']
},
{
"id": "id3",
"tags": ['a', 'c']
}
How can I make a query that groups by every element in the "tags" array, so the result looks like this?:
{'a': 2},
{'b': 2},
{'c': 2}
(where 2 is the number of times it appears, the count).
Thanks for your help!
You can use this aggregation query:
First $unwind the array to deconstruct an access like objects.
Then $group by tags and $sum 1 to get the total.
And last use $replaceRoot with $arrayToObject to get the desired output.
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$group": {
"_id": "$tags",
"count": {
"$sum": 1
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
"k": "$_id",
"v": "$count"
}
]
]
}
}
}
])
Example here
As an adittion, if you want to get sorted values (a, b, c...) you can add $sort stage like this example

How to add calculated fields inside subdocuments without using unwind?

I'm looking for a simple solution to add a field to a subdocument without using $unwind and $group.
I need only to calculate the sum and the size of a nested subdocuments and show it in a new field.
This is my starting collection:
{
"_id": ObjectId("5a934e000102030405000000"),
"subDoc": [
{
"a": 1,
"subArray": [1,2,3]
},
{
"b": 2
}
]
},
{
"_id": ObjectId("5a934e000102030405000001"),
"subDoc": [
{
"a": 1,
"subArray": [4,5,6]
},
{
"b": 2
},
{
"c": 3,
"subArray": [8,8,8]
}
]
}
And this is my desired result, where I've added sum (sum of subArray) and size (number of elements in subArray):
{
"_id": ObjectId("5a934e000102030405000000"),
"subDoc": [
{
"a": 1,
"subArray": [1,2,3],
"sum": 6,
"size": 3
},
{
"b": 2
"sum": 0,
"size": 0
}
]
},
{
"_id": ObjectId("5a934e000102030405000001"),
"subDoc": [
{
"a": 1,
"subArray": [4,5,6],
"sum": 15,
"size": 3
},
{
"b": 2,
"sum": 0,
"size": 0
},
{
"c": 3,
"subArray": [8,8],
"sum": 16,
"size": 2
}
]
}
I know how to obtain this result using $unwind and then $group, but I'd like to know if there is any other way (or a better way!) to achieve the same result. I've tried using $addFields and $map without success.
Working playground example: https://mongoplayground.net/p/fK8t6SLlOHa
$map to iterate loop of subDoc array
$sum to get total of subArray array of numbers
$ifNull to check if field is not present or null then return empty array because $size operator only allows array input
$size to get total elements in subArray array
$mergeObjects to merge current object with new added fields
db.collection.aggregate([
{
$addFields: {
subDoc: {
$map: {
input: "$subDoc",
in: {
$mergeObjects: [
"$$this",
{
sum: { $sum: "$$this.subArray" },
size: {
$size: { $ifNull: ["$$this.subArray", []] }
}
}
]
}
}
}
}
}
])
Playground

how to get the sum of a particular element in all the objects of an array using pymongo

Below is my collection
[{'_id': ObjectId('603e9cc2784fa0d80d8672cd'),
'name': 'balaji',
'items': [{'price': 1, 'price_range': 'A'},
{'price': 6, 'price_range': 'B'},
{'price': 4, 'price_range': 'C'}]}]
So in the above collection, we can see only one record and it contains an array with name items and this array contains objects with price and price_range attributes, may I know how to get the sum of all the prices in this array please, I tried with below query and it did not work
aggregation_string = [{"$match":{"name": "balaji"
}},{ "$group": {
"_id": None,
"count": { "$sum": "$items.price" }
}}]
db.sample_collection1.aggregate(aggregation_string)
and I am getting count as 0. Can someone please help me here.
In your example since you don't need to group the objects you can simply project the sum this way :
db.collection.aggregate([
{
"$match": {
"name": "balaji"
}
},
{
"$project": {
"name": 1,
"priceTotal": {
"$sum": "$items.price"
}
}
},
])
It should works from mongoDB 3.2 and I think it's the best way.
But if you absolutely need to use the $group, you have to do it this way:
db.collection.aggregate([
{
"$match": {
"name": "balaji"
}
},
{
"$group": {
"_id": null,
"count": {
"$sum": {
"$sum": "$items.price"
}
}
}
}
])
It was your $sum query that was incomplete.
Or with the unwind operator to avoid doing twice the $sum :
db.collection.aggregate([
{
"$match": {
"name": "balaji"
}
},
{
"$unwind": "$items",
},
{
"$group": {
"_id": null,
"count": {
"$sum": "$items.price"
}
}
}
])

Single MongoDB query to aggregate count

I have a collection peopleColl containing records with people data. Each record is uniquely indexed by id and has a managers field of type array.
Example:
{
id: 123,
managers: [456, 789]
},
{
id: 321,
managers: [555, 789]
}
I want to write a single query to find all people with the same manager, for several ids (managers). So given [456, 555, 789] the desired output would be:
{
456: 1,
555: 1,
789: 2
}
I can do it (slowly) in a for-loop in Python as follows:
idToCount = {id: peopleColl.count({"managers": id}) for id in ids}
Edit: I am primarily interested in solutions <= MongoDB 3.4
You can try below aggregation in mongodb 3.4.4 and above
db.collection.aggregate([
{ "$unwind": "$managers" },
{ "$group": { "_id": "$managers", "count": { "$sum": 1 }}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": { "$toLower": "$_id" },
"v": "$count"
}
}
}},
{ "$replaceRoot": { "newRoot": { "$arrayToObject": "$data" }}}
])
Output
[
{
"456": 1,
"555": 1,
"789": 2
}
]
You can try below pipeline.
db.collection.aggregate([
{ "$unwind": "$managers" },
{ "$group": { "_id": "$managers", "count": { "$sum": 1 }}}
])
Output:
{'_id': 456, 'count': 1},
{'_id': 555, 'count': 1},
{'_id': 789, 'count': 2}
So you can loop through and create the Id-Count mapping
result = db.collection.aggregate([
{ "$unwind": "$managers" },
{ "$group": { "_id": "$managers", "count": { "$sum": 1 }}}
])
iD_Count = {}
result.forEach(function(d, i) {
iD_Count[d._id] = d.count;
})
iD_Count:
{
456: 1,
555: 1,
789: 2
}
You can try below aggregation in 3.6.
db.colname.aggregate([
{"$unwind":"$managers"},
{"$group":{"_id":"$managers","count":{"$sum":1}}},
{"$group":{
"_id":null,
"managerandcount":{"$mergeObjects":{"$arrayToObject":[[["$_id","$count"]]]}}
}},
{"$replaceRoot":{"newRoot":"$managerandcount"}}
])

Alternative query for MongoDB aggregate function

Here is the sample data that I am working on:
{
"_id": 1,
"user": A,
"nums":[1,2,3,4]
}
{
"_id": 2,
"user": B,
"nums":[1,2,4]
}
{
"_id": 3,
"user": B,
"nums":[4,5,7]
}
What I am trying to get is the number of logs for each user and the distinct "nums" list for each user. So the result is something like this:
[
{
"user": A,
"total": 1,
"nums" : [1,2,3,4]
},
{
"user": B,
"total": 2,
"nums" : [1,2,4,5,7]
}
]
Is that possible to achieve in one aggregate query? I am now using two.
db.test.aggregate([{ $group: { _id:"$user", total:{$sum:1}}}])
db.test.aggregate([{$unwind:"$nums"}, { $group: { _id:"$user", nums:{$addToSet:"$nums"}}}])
Also, should one query be faster than two separate queries on large data set or I should just stay with two queries?
You can do this by assembling a list of the original _id values from the docs in the $group after the $unwind to provide a way to get the total count in a final $project:
db.test.aggregate([
{$unwind: '$nums'},
{$group: {
_id: '$user',
ids: {$addToSet: '$_id'},
nums: {$addToSet: '$nums'}
}},
{$project: {
_id: 0,
user: '$_id',
total: {$size: '$ids'},
nums: 1
}}
])
Result:
[
{
"nums": [
7,
5,
4,
2,
1
],
"user": "B",
"total": 2
},
{
"nums": [
4,
3,
2,
1
],
"user": "A",
"total": 1
}
]
I would expect that doing it all in one aggregate pipeline instead of two will be faster, but it's always best to test it in your own environment to be sure.