Find documents that share one key but differ in another - mongodb

I have a mongodb collection that is resembles
{"dept":"A" , "email":"bob#example.com", "userID": "1"}
{"dept":"A" , "email":"bob#example.com", "userID": "1"}
{"dept":"A" , "email":"bob#example.com", "userID": "2"} <<< "bad" record
{"dept":"A" , "email":"alice#example.com", "userID": "3"}
{"dept":"B" , "email":"bob#example.com", "userID": "4"}
{"dept":"B" , "email":"kevin#example.com", "userID": "5"}
The constraint is that an email must only have a single userID per department.
How would I query the table to find which emails have multiple userIDs within a department? Mongo 4.4+

You have to use two $group pipeline stages to filter and find records with multiple entries.
db.collection.aggregate([
{
"$group": {
"_id": {
"dept": "$dept",
"email": "$email",
"userID": "$userID",
},
"individualCount": {
"$sum": 1
}
},
},
{
"$group": {
"_id": "$_id.email",
"userIDs": {
"$addToSet": "$_id.userID"
},
"dept": {
"$addToSet": "$_id.dept"
},
"totalRecordsCount": {
"$sum": "$individualCount"
},
"totalDuplicCounts": {
"$sum": 1
},
},
},
{
"$match": {
"totalDuplicCounts": {
"$gt": 1
}
},
},
])
Mongo Playground Sample Execution

Related

How to get count by order in mongodb aggregate?

I have two collections name listings and moods.
listings sample:
{
"_id": ObjectId("5349b4ddd2781d08c09890f3"),
"name": "Hotel Radisson Blu",
"moods": [
ObjectId("507f1f77bcf86cd799439010"),
ObjectId("507f1f77bcf86cd799439011")
]
}
moods sample:
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "Sports"
},
{
"_id": ObjectId("507f1f77bcf86cd799439010"),
"name": "Spanish Food"
},
{
"_id": ObjectId("507f1f77bcf86cd799439009"),
"name": "Action"
}
I need this record.
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "Sports",
"count": 1
},
{
"_id": ObjectId("507f1f77bcf86cd799439010"),
"name": "Spanish Food",
"count": 1
},
{
"_id": ObjectId("507f1f77bcf86cd799439009"),
"name": "Action",
"count": 0
}
I need this type of record. I have no idea about aggregate.
You can do it using aggregate(),
$lookup to join collection listings
$match pipeline to check moods _id in listings field moods array
db.moods.aggregate([
{
"$lookup": {
"from": "listings",
"as": "count",
let: { id: "$_id" },
pipeline: [
{
"$match": {
"$expr": { "$in": ["$$id", "$moods"] }
}
}
]
}
},
$addFields to add count on the base of $size of array count that we got from above lookup
{
$addFields: {
count: { $size: "$count" }
}
}
])
Playground
did this work:
db.collection.aggrate().count()
Try to combine the functions, it might work.

How to collate teams by a property of the leader, in MongoDB

I have a database with some users, who belong to teams. Each team has a leader. Each user has a subject.
I want to collate teams by the leader's subject.
My data looks like this:
db={
"teams": [
{
_id: "t1",
members: [
{
"_id": "u1",
"leader": true
},
{
"_id": "u2"
},
{
"_id": "u3"
}
],
},
{
_id: "t2",
members: [
{
"_id": "u2",
"leader": true
},
{
"_id": "u4"
}
],
},
{
_id: "t3",
members: [
{
"_id": "u1",
"leader": true
},
{
"_id": "u4"
}
],
},
{
_id: "t4",
members: [
{
"_id": "u2",
"leader": true
}
],
},
],
"users": [
{
"_id": "u1",
"subject": "history"
},
{
"_id": "u2",
"subject": "maths"
},
{
"_id": "u3",
"subject": "geography"
},
{
"_id": "u4",
"subject": "french"
}
]
}
The result I want is:
{
"history": ["t1", "t3"],
"maths": ["t2", "t4"]
}
I have an aggregation that gets me the _id of every leader, and from there I can get the result I want in stages, by first finding the subject of every leader, then going back through the projects and assigning a subject to each project based on the identify of the leader. It works but it is inelegant and I think it will be slow. It seems to me there should be some better way to do this, maybe something like a join?
Is there a nifty way to get the result I want from a single MongoDB operation?
Here is a Mongo Playground with my data:
https://mongoplayground.net/p/SIJv9-hVNzJ
Many thanks for any help.
Edit: my test data are confusing because '_id' is used in both collections, making it hard to unpack the answer. Here is an updated Mongo Playground that uses different key names for each collection and helped me to understand the perfect answer.
Yes, you should join your collections on users._id with a $lookup, and then transform value to key with $arrayToObject (introduced in Mongodb 3.4.4)
Here is a possible way to do this :
db.teams.aggregate([
{
"$unwind": "$members"
},
{
"$match": {
"members.leader": true
}
},
{
"$lookup": {
"from": "users",
"localField": "members._id",
"foreignField": "_id",
"as": "users"
}
},
{
"$unwind": "$users"
},
{
"$group": {
"_id": "$users.subject",
"team": {
"$push": "$_id"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
k: "$_id",
v: "$team"
}
]
]
}
}
}
])
try it online: mongoplayground.net/p/TuEpMzHkI-0

mongo db How to group data like this?

Now i get this table(it can't be more than two kinds of {A,B,C} to appear in the same data at the same time.):
{_id:1,A:a}
{_id:2,B:b}
{_id:3,C:a}
{_id:4,A:a}
{_id:5,A:b}
{_id:6,A:c}
{_id:7,C:a}
How to get this result?
a:4
b:2
c:1
you can get this result with mongodb aggregation framework,
first, you'll need to add all the value in a single field, and the perform a $group on that field:
db.collection.aggregate([{
"$project": {
"v": ["$A", "$B", "$C"]
}
}, {
"$unwind": "$v"
}, {
"$match": {
"v": {
"$ne": null
}
}
}, {
"$group": {
"_id": "$v",
"count": {
"$sum": 1
}
}
}])
result:
[
{
"_id": "c",
"count": 1
},
{
"_id": "b",
"count": 2
},
{
"_id": "a",
"count": 4
}
]
you can try it here: mongoplayground.net/p/rGHUPWsw2ee

How to $push a field depending on a condition?

I'm trying to conditionally push a field into an array during the $group stage of the MongoDB aggregation pipeline.
Essentially I have documents with the name of the user, and an array of the actions they performed.
If I group the user actions like this:
{ $group: { _id: { "name": "$user.name" }, "actions": { $push: $action"} } }
I get the following:
[{
"_id": {
"name": "Bob"
},
"actions": ["add", "wait", "subtract"]
}, {
"_id": {
"name": "Susan"
},
"actions": ["add"]
}, {
"_id": {
"name": "Susan"
},
"actions": ["add, subtract"]
}]
So far so good. The idea would be to now group together the actions array to see which set of user actions are the most popular. The problem is that I need to remove the "wait" action before taking into account the group. Therefore the result should be something like this, taking into account that the "wait" element should not be considered in the grouping:
[{
"_id": ["add"],
"total": 1
}, {
"_id": ["add", "subtract"],
"total": 2
}]
Test #1
If I add this $group stage:
{ $group : { _id : "$actions", total: { $sum: 1} }}
I get the count that I want, but it takes into account the unwanted "wait" array element.
[{
"_id": ["add"],
"total": 1
}, {
"_id": ["add", "subtract"],
"total": 1
}, {
"_id": ["add", "wait", "subtract"],
"total": 1
}]
Test #2
{ $group: { _id: { "name": "$user.name" }, "actions": { $push: { $cond: { if:
{ $ne: [ "$action", 'wait']}, then: "$action", else: null } }}} }
{ $group : { _id : "$actions", total: { $sum: 1} }}
This is as close as I've gotten, but this pushes null values where the wait would be, and I can't figure out how to remove them.
[{
"_id": ["add"],
"total": 1
}, {
"_id": ["add", "subtract"],
"total": 1
}, {
"_id": ["add", null, "subtract"],
"total": 1
}]
UPDATE:
My simplified documents look like this:
{
"_id": ObjectID("573e0c6155e2a8f9362fb8ff"),
"user": {
"name": "Bob",
},
"action": "add",
}
You need a preliminary $match stage in your pipeline to select only those documents where "action" is not equals to "wait".
db.collection.aggregate([
{ "$match": { "action": { "$ne": "wait" } } },
{ "$group": {
"_id": "$user.name",
"actions": { "$push": "$action" },
"total": { "$sum": 1 }
}}
])

Filtering a list of votes where more than x matches are found

I have the following vote data in a large collection:
{
"user_id" : ObjectId("53ac7bce4eaf6de4d5601c1a"),
"article_id" : ObjectId("53ab27504eaf6de4d5601be5"),
"score" : 5
},
{
"user_id" : ObjectId("53ac7bce4eaf6de4d5601c1b"),
"article_id" : ObjectId("53ab27504eaf6de4d5601be5"),
"score" : 3
},
{
"user_id" : ObjectId("53ac7bce4eaf6de4d5601c1c"),
"article_id" : ObjectId("53ab27504eaf6de4d5601be5"),
"score" : 3
},
...
I'm looking to filter this collection where more than 3 votes have been obtained for a single article (as above) and output as-is (excluding any vote entries on articles < 3 total votes).
Any help much appreciated. This collection can be huge so efficiency would be ideal.
Normally not something you do in a single operation, but you can do this if those really are your only fields and there are not too many matching documents.
db.collection.aggregate([
{ "$group": {
"_id": "$article_id",
"docs": {
"$push": {
"user_id": "$user_id",
"article_id": "$article_id",
"score": "$score"
}
},
"votes": { "$sum": 1 }
}},
{ "$match": { "votes": { "$gt": 3 } } },
{ "$unwind": "$docs" },
{ "$project": {
"user_id": "$docs.user_id",
"article_id": "$docs.article_id",
"score": "$docs.score"
}}
])
You can clean that up a little with MongoDB 2.6 and greater which provides a system variable in the pipeline for $$ROOT:
db.collection.aggregate([
{ "$group": {
"_id": "$article_id",
"docs": {
"$push": "$$ROOT"
},
"votes": { "$sum": 1 }
}},
{ "$match": { "votes": { "$gt": 3 } } },
{ "$unwind": "$docs" },
{ "$project": {
"user_id": "$docs.user_id",
"article_id": "$docs.article_id",
"score": "$docs.score"
}}
])
Otherwise you can accept that you are doing this in a few steps and process the list of "article_id" values returned with a "count" greater than three:
var ids = db.collection.aggregate([
{ "$group": {
"_id": "$article_id",
"votes": { "$sum": 1 }
}},
{ "$match": { "votes": { "$gt": 3 } } },
]).toArray().map(function(x){ return x._id });
db.collection.find({ "article_id": { "$in": ids } })
If that was a shell operation then you would use the "results" key from the array of results that was returned by default in versions earlier to 2.6.