How to get distinct name and count in MongoDB using PyMongo - mongodb

I have the below collection as shown below. All I want is the distinct "Name" and the count. For example Betty appears 2 times, so the output I want is Betty:2, Vic:1, Veronica:2. I am able to get the distinct Name by issuing the command "db.Car.find().distinct('Name')" but not sure how to get the count.
{
"Name": "Betty",
"Car": "Jeep",
}
{
"Name": "Betty",
"Car": "Van",
}
{
"Name": "Vic",
"Car": "Ferrari",
}
{
"Name": "Veronica",
"Car": "Bus",
}
{
"Name": "Veronica",
"Car": "Van",
}

You can just use $group to group by Name field and use $sum operator in it to get the Count field.
Something like below:
db.collection.aggregate([
{
"$group": {
"_id": "$Name",
"Count": {
"$sum": 1
}
}
},
{
"$project": {
"Name": "$_id",
"Count": 1,
"_id":0
}
}
])
The above will produce the following output:
[
{
"Count": 2,
"Name": "Betty"
},
{
"Count": 1,
"Name": "Vic"
},
{
"Count": 2,
"Name": "Veronica"
}
]

Related

2-level group by for objects in an array

Good day SO Community,
I would like to ask for your help in creating the correct aggregation pipeline for sample data:
[
{
"group": "A",
"subgroup": "A1",
"name": "Abby"
},
{
"group": "A",
"subgroup": "A2",
"name": "Andy"
},
{
"group": "A",
"subgroup": "A2",
"name": "Amber"
},
{
"group": "B",
"subgroup": "B1",
"name": "Bart"
}
]
I want to group by group first, then for each group, group by subgroup.
The names will also go to their respective subgroup and the count is showing the actual count.
My expected output is as follows:
[
{
"_id": "B",
"count": 1,
"subgroup": [
{
"_id": "B1",
"count": 1,
"names": ["Bart"]
}
]
},
{
"_id": "A",
"count": 3,
"subgroup": [
{
"_id": "A1",
"count": 1,
"names":[ "Abby"]
},
{
"_id": "A2",
"count": 2,
"names": ["Amber", "Andy"]
}
]
}
]
I have tried this pipeline but it's not grouping the subgroups.
{
"$group": {
"_id": "$group",
"subgroup": {
"$addToSet": {
"_id": "$subgroup",
"name": "$name",
count: {
$sum: 1
}
}
},
count: {
$sum: 1
}
}
}
The aggregation pipeline and actual output can be seen in the playground:
https://mongoplayground.net/p/MO1fCf21Rez
Thank you!
$group - Group by group and subgroup. Perform count and add name into names array.
$group - Group by group. Perform total count and add the object for subgroup into subgroup array.
db.students.aggregate([
{
$group: {
_id: {
group: "$group",
subgroup: "$subgroup"
},
names: {
$push: "$name"
},
count: {
$sum: 1
}
}
},
{
"$group": {
"_id": "$_id.group",
"subgroup": {
$addToSet: {
"_id": "$_id.subgroup",
"names": "$names",
count: "$count"
}
},
count: {
$sum: "$count"
}
}
}
])
Demo # Mongo Playground

Find the duplicate field name records present inside the nested array object in mongodb

My collection :
[
{
"Empname": "Doug",
"Group": [
{
"Category": [
{
"Categoryid": 123,
"Categoryname": "science"
},
{
"Categoryid": 233,
"Categoryname": "Maths"
}
]
}
]
},
{
"Empname": "stark",
"Group": [
{
"Category": [
{
"Categoryid": 123,
"Categoryname": "science"
},
{
"Categoryid": 144,
"Categoryname": "language "
}
]
}
]
}
]
I want to display the following output
Here categoryid 123 and categoryname science is present twice.I want to display this duplicate like
{"categoryname":"science","count":2}
You have to perform $unwind twice to unwind the nested arrays and perform $group on the Category fields.
Below query is what you are looking for
db.collection.aggregate([
{
"$unwind": "$Group"
},
{
"$unwind": "$Group.Category"
},
{
"$group": {
"_id": {
"Categoryid": "$Group.Category.Categoryid",
"Categoryname": "$Group.Category.Categoryname",
},
"count": {
"$sum": 1
}
}
},
])
Mongo Playground Sample Execution
Edit: Improving with #Takis_ suggestion
db.collection.aggregate([
{
"$unwind": "$Group"
},
{
"$unwind": "$Group.Category"
},
{
"$group": {
"_id": {
"Categoryid": "$Group.Category.Categoryid",
"Categoryname": "$Group.Category.Categoryname",
},
"count": {
"$sum": 1
}
}
},
{
"$match": {
"count": {
"$gt": 1
}
}
},
{
"$project": {
"_id": 0,
"Categoryname": "$_id.Categoryname",
"count": 1
}
}
])
Mongo Playground Sample Execution

Count the number of duplicate elements in MongoDB

I have the collection blow in mongodb:
{
"Id": "5",
"Group": [
{
"Name": "frank",
"Roll": "123"
}
]
},
{
"Id": "6",
"Group": [
{
"Name": "John",
"Roll": "124"
}
]
},
{
"Id": "7",
"Group": [
{
"Name": "John",
"Roll": "125"
}
]
}
The name "John" appears twice. I would like to display the number of each name that appears more than once:
{"Name": "John", "Count":2 }
You can use this aggregation query:
First $unwind to deconstruct the array and get all values as an object.
Then group by the name and $sum 1 for each name.
And then $match to get those values which exists more than one time (i.e. are repeated)
And last stage is to output values you want, in this case Name and Count.
db.collection.aggregate([
{
"$unwind": "$Group"
},
{
"$group": {
"_id": "$Group.Name",
"Count": {
"$sum": 1
},
}
},
{
"$match": {
"Count": {
"$gt": 1
}
}
},
{
"$project": {
"_id": 0,
"Name": "$_id",
"Count": 1
}
}
])
Example here

Get Distinct Document by Max Value of a Field

I have a requirement where i should query on two fields out of which one is unique field and one is maximum field.
Here is my sample collection
{
"_id": ObjectId('59537b7fe08062b9ee8dfdf6'),
"admin": {
"model": "abc",
"version": "00",
"name":"john",
"age":"30"
}
}
{
"_id": ObjectId('59537b7fe08062b9ee8dfdf7'),
"admin": {
"model": "abc",
"version": "01" ,
"name":"john",
"age":"30"
}
}
{
"_id": ObjectId('59537b7fe08062b9ee8dfdf8'),
"admin": {
"model": "def",
"version": "00" ,
"name":"cena",
"age":"30"
}
}
I have two same models with different versions.I want to query for model with maximum version. I tried by simply sorting the version it does not work for me.
I am expecting output like this
{
"_id": ObjectId('59537b7fe08062b9ee8dfdf7'),
"admin": {
"model": "abc",
"version": "01" ,
"name":"john",
"age":"30"
}
}
{
"_id": ObjectId('59537b7fe08062b9ee8dfdf8'),
"admin": {
"model": "def",
"version": "00" ,
"name":"cena",
"age":"30"
}
}
Any suggestions will be really helpful.
As Neil said, it is $sort, $group, and $replaceRoot, but with correct values in the query:
db.collection.aggregate([
{ "$sort": { "admin.version": -1 } },
{ "$group": {
"_id": "$admin.model" ,
"admin": { "$first": "$$ROOT" }
}},
{ "$replaceRoot": { "newRoot": "$admin" } }
])

How to $push a field depending on a condition?

I'm trying to conditionally push a field into an array during the $group stage of the MongoDB aggregation pipeline.
Essentially I have documents with the name of the user, and an array of the actions they performed.
If I group the user actions like this:
{ $group: { _id: { "name": "$user.name" }, "actions": { $push: $action"} } }
I get the following:
[{
"_id": {
"name": "Bob"
},
"actions": ["add", "wait", "subtract"]
}, {
"_id": {
"name": "Susan"
},
"actions": ["add"]
}, {
"_id": {
"name": "Susan"
},
"actions": ["add, subtract"]
}]
So far so good. The idea would be to now group together the actions array to see which set of user actions are the most popular. The problem is that I need to remove the "wait" action before taking into account the group. Therefore the result should be something like this, taking into account that the "wait" element should not be considered in the grouping:
[{
"_id": ["add"],
"total": 1
}, {
"_id": ["add", "subtract"],
"total": 2
}]
Test #1
If I add this $group stage:
{ $group : { _id : "$actions", total: { $sum: 1} }}
I get the count that I want, but it takes into account the unwanted "wait" array element.
[{
"_id": ["add"],
"total": 1
}, {
"_id": ["add", "subtract"],
"total": 1
}, {
"_id": ["add", "wait", "subtract"],
"total": 1
}]
Test #2
{ $group: { _id: { "name": "$user.name" }, "actions": { $push: { $cond: { if:
{ $ne: [ "$action", 'wait']}, then: "$action", else: null } }}} }
{ $group : { _id : "$actions", total: { $sum: 1} }}
This is as close as I've gotten, but this pushes null values where the wait would be, and I can't figure out how to remove them.
[{
"_id": ["add"],
"total": 1
}, {
"_id": ["add", "subtract"],
"total": 1
}, {
"_id": ["add", null, "subtract"],
"total": 1
}]
UPDATE:
My simplified documents look like this:
{
"_id": ObjectID("573e0c6155e2a8f9362fb8ff"),
"user": {
"name": "Bob",
},
"action": "add",
}
You need a preliminary $match stage in your pipeline to select only those documents where "action" is not equals to "wait".
db.collection.aggregate([
{ "$match": { "action": { "$ne": "wait" } } },
{ "$group": {
"_id": "$user.name",
"actions": { "$push": "$action" },
"total": { "$sum": 1 }
}}
])