I have a collection that looks like this:
{
"id": "id1",
"tags": ['a', 'b']
},
{
"id": "id2",
"tags": ['b', 'c']
},
{
"id": "id3",
"tags": ['a', 'c']
}
How can I make a query that groups by every element in the "tags" array, so the result looks like this?:
{'a': 2},
{'b': 2},
{'c': 2}
(where 2 is the number of times it appears, the count).
Thanks for your help!
You can use this aggregation query:
First $unwind the array to deconstruct an access like objects.
Then $group by tags and $sum 1 to get the total.
And last use $replaceRoot with $arrayToObject to get the desired output.
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$group": {
"_id": "$tags",
"count": {
"$sum": 1
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
"k": "$_id",
"v": "$count"
}
]
]
}
}
}
])
Example here
As an adittion, if you want to get sorted values (a, b, c...) you can add $sort stage like this example
Related
Note: Each collection contains 96.5k documents and each collection have these fields --
{
"name": "variable1",
"startTime": "variable2",
"endTime": "variable3",
"classes": "variable4",
"section": "variable"
}
I have 2 collections. I have to compare these 2 collection and have to find out whether some specific fields( here I want name, startTime, endTime) of the documents are same in both the collection.
My approach was to join these 2 collection and then use $lookup .. I also tried the following query but it didn't work.
Please help me.
col1.aggregate([
{
"$unionWith": {"col1": "col2"}
},
{
"$group":
{
"_id":
{
"Name": "$Name",
"startTime": "$startTime",
"endTime": "$endTime"
},
"count": {"$sum": 1},
"doc": {"$first": "$$ROOT"}
}
},
{
"$match": {"$expr": {"$gt": ["$count", 1]}}
},
{
"$replaceRoot": {"newRoot": "$doc"}
},
{
"$out": "newCollectionWithDuplicates"
}
])
You're approach is fine you just have a minor syntax error in your $unionWith, it's suppose to be like so:
{
"$unionWith": {
coll: "col2",
pipeline: []
}
}
Mongo Playground
I have a collection that looks like this:
{
"id": "id1",
"tags": ['a', 'b']
},
{
"id": "id2",
"tags": ['b', 'c']
},
{
"id": "id3",
"tags": ['a', 'c']
}
How can I make a query that groups by every element in the "tags" array, so the result looks like this?:
{'a': 2},
{'b': 2},
{'c': 2}
(where 2 is the number of times it appears, the count).
Thanks for your help!
You can use this aggregation query:
First $unwind the array to deconstruct an access like objects.
Then $group by tags and $sum 1 to get the total.
And last use $replaceRoot with $arrayToObject to get the desired output.
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$group": {
"_id": "$tags",
"count": {
"$sum": 1
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
"k": "$_id",
"v": "$count"
}
]
]
}
}
}
])
Example here
As an adittion, if you want to get sorted values (a, b, c...) you can add $sort stage like this example
Below is my collection
[{'_id': ObjectId('603e9cc2784fa0d80d8672cd'),
'name': 'balaji',
'items': [{'price': 1, 'price_range': 'A'},
{'price': 6, 'price_range': 'B'},
{'price': 4, 'price_range': 'C'}]}]
So in the above collection, we can see only one record and it contains an array with name items and this array contains objects with price and price_range attributes, may I know how to get the sum of all the prices in this array please, I tried with below query and it did not work
aggregation_string = [{"$match":{"name": "balaji"
}},{ "$group": {
"_id": None,
"count": { "$sum": "$items.price" }
}}]
db.sample_collection1.aggregate(aggregation_string)
and I am getting count as 0. Can someone please help me here.
In your example since you don't need to group the objects you can simply project the sum this way :
db.collection.aggregate([
{
"$match": {
"name": "balaji"
}
},
{
"$project": {
"name": 1,
"priceTotal": {
"$sum": "$items.price"
}
}
},
])
It should works from mongoDB 3.2 and I think it's the best way.
But if you absolutely need to use the $group, you have to do it this way:
db.collection.aggregate([
{
"$match": {
"name": "balaji"
}
},
{
"$group": {
"_id": null,
"count": {
"$sum": {
"$sum": "$items.price"
}
}
}
}
])
It was your $sum query that was incomplete.
Or with the unwind operator to avoid doing twice the $sum :
db.collection.aggregate([
{
"$match": {
"name": "balaji"
}
},
{
"$unwind": "$items",
},
{
"$group": {
"_id": null,
"count": {
"$sum": "$items.price"
}
}
}
])
I have a collection peopleColl containing records with people data. Each record is uniquely indexed by id and has a managers field of type array.
Example:
{
id: 123,
managers: [456, 789]
},
{
id: 321,
managers: [555, 789]
}
I want to write a single query to find all people with the same manager, for several ids (managers). So given [456, 555, 789] the desired output would be:
{
456: 1,
555: 1,
789: 2
}
I can do it (slowly) in a for-loop in Python as follows:
idToCount = {id: peopleColl.count({"managers": id}) for id in ids}
Edit: I am primarily interested in solutions <= MongoDB 3.4
You can try below aggregation in mongodb 3.4.4 and above
db.collection.aggregate([
{ "$unwind": "$managers" },
{ "$group": { "_id": "$managers", "count": { "$sum": 1 }}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"k": { "$toLower": "$_id" },
"v": "$count"
}
}
}},
{ "$replaceRoot": { "newRoot": { "$arrayToObject": "$data" }}}
])
Output
[
{
"456": 1,
"555": 1,
"789": 2
}
]
You can try below pipeline.
db.collection.aggregate([
{ "$unwind": "$managers" },
{ "$group": { "_id": "$managers", "count": { "$sum": 1 }}}
])
Output:
{'_id': 456, 'count': 1},
{'_id': 555, 'count': 1},
{'_id': 789, 'count': 2}
So you can loop through and create the Id-Count mapping
result = db.collection.aggregate([
{ "$unwind": "$managers" },
{ "$group": { "_id": "$managers", "count": { "$sum": 1 }}}
])
iD_Count = {}
result.forEach(function(d, i) {
iD_Count[d._id] = d.count;
})
iD_Count:
{
456: 1,
555: 1,
789: 2
}
You can try below aggregation in 3.6.
db.colname.aggregate([
{"$unwind":"$managers"},
{"$group":{"_id":"$managers","count":{"$sum":1}}},
{"$group":{
"_id":null,
"managerandcount":{"$mergeObjects":{"$arrayToObject":[[["$_id","$count"]]]}}
}},
{"$replaceRoot":{"newRoot":"$managerandcount"}}
])
Here is the sample data that I am working on:
{
"_id": 1,
"user": A,
"nums":[1,2,3,4]
}
{
"_id": 2,
"user": B,
"nums":[1,2,4]
}
{
"_id": 3,
"user": B,
"nums":[4,5,7]
}
What I am trying to get is the number of logs for each user and the distinct "nums" list for each user. So the result is something like this:
[
{
"user": A,
"total": 1,
"nums" : [1,2,3,4]
},
{
"user": B,
"total": 2,
"nums" : [1,2,4,5,7]
}
]
Is that possible to achieve in one aggregate query? I am now using two.
db.test.aggregate([{ $group: { _id:"$user", total:{$sum:1}}}])
db.test.aggregate([{$unwind:"$nums"}, { $group: { _id:"$user", nums:{$addToSet:"$nums"}}}])
Also, should one query be faster than two separate queries on large data set or I should just stay with two queries?
You can do this by assembling a list of the original _id values from the docs in the $group after the $unwind to provide a way to get the total count in a final $project:
db.test.aggregate([
{$unwind: '$nums'},
{$group: {
_id: '$user',
ids: {$addToSet: '$_id'},
nums: {$addToSet: '$nums'}
}},
{$project: {
_id: 0,
user: '$_id',
total: {$size: '$ids'},
nums: 1
}}
])
Result:
[
{
"nums": [
7,
5,
4,
2,
1
],
"user": "B",
"total": 2
},
{
"nums": [
4,
3,
2,
1
],
"user": "A",
"total": 1
}
]
I would expect that doing it all in one aggregate pipeline instead of two will be faster, but it's always best to test it in your own environment to be sure.