find documents having a specific count of matches array - mongodb

I've searched high and low but not been able to find what i'm looking for so apologies if this has already been asked.
Consider the following documents
{
_id: 1,
items: [
{
category: "A"
},
{
category: "A"
},
{
category: "B"
},
{
category: "C"
}]
},
{
_id: 2,
items: [
{
category: "A"
},
{
category: "B"
}]
},
{
_id: 3,
items: [
{
category: "A"
},
{
category: "A"
},
{
category: "A"
}]
}
I'd like to be able to find those documents which have more than 1 category "A" item in the items array. So this should find documents 1 and 3.
Is this possible?

Using aggregation
> db.spam.aggregate([
{$unwind: "$items"},
{$match: {"items.category" :"A"}},
{$group: {
_id: "$_id",
item: {$push: "$items.category"}, count: {$sum: 1}}
},
{$match: {count: {$gt: 1}}}
])
Output
{ "_id" : 3, "item" : [ "A", "A", "A" ], "count" : 3 }
{ "_id" : 1, "item" : [ "A", "A" ], "count" : 2 }

Related

Apply multistage grouping in MongoDb Aggregation Framework

lets's assume I have the following data:
[
{ name: "Clint", hairColor: "brown", shoeSize: 8, income: 20000 },
{ name: "Clint", hairColor: "blond", shoeSize: 9, income: 30000 },
{ name: "George", hairColor: "brown", shoeSize: 7, income: 30000 },
{ name: "George", hairColor: "blond", shoeSize: 8, income: 10000 },
{ name: "George", hairColor: "blond", shoeSize: 9, income: 20000 }
]
I want to have the following output:
[
{
name: "Clint",
counts: 2,
avgShoesize: 8.5,
shoeSizeByHairColor: [
{ _id: "brown", counts: 1, avgShoesize: 8 },
{ _id: "blond", counts: 1, avgShoesize: 9 },
],
incomeByHairColor: [
{ _id: "brown", counts: 1, avgIncome: 20000 },
{ _id: "blond", counts: 1, avgIncome: 30000 },
]
},
{
name: "George",
counts: 3,
avgShoesize: 8,
shoeSizeByHairColor: [
{ _id: "brown", counts: 1, avgShoesize: 8 },
{ _id: "blond", counts: 2, avgShoesize: 8.5 },
],
incomeByHairColor: [
{ _id: "brown", counts: 1, avgIncome: 30000 },
{ _id: "blond", counts: 2, avgIncome: 15000 },
],
}
]
Basically I want to group my dataset by some key and then I want to have multiple groups of the subset.
First I thought of applying a $group with the key name. and the to use $facet in order to have various aggregations. I guess this will ot work since $facet does not use the subset from the previous $group. If I use $facet first I would need to split the result in multiple documents.
Any ideas how to properly solve my problem?
You need double $group, first one should aggregate by name and hairColor. And the second one can build nested array:
db.collection.aggregate([
{
$group: {
_id: { name: "$name", hairColor: "$hairColor" },
count: { $sum: 1 },
sumShoeSize: { $sum: "$shoeSize" },
avgShoeSize: { $avg: "$shoeSize" },
avgIncome: { $avg: "$income" },
docs: { $push: "$$ROOT" }
}
},
{
$group: {
_id: "$_id.name",
count: { $sum: "$count" },
sumShoeSize: { $sum: "$sumShoeSize" },
shoeSizeByHairColor: {
$push: {
_id: "$_id.hairColor", counts: "$count", avgShoeSize: "$avgShoeSize"
}
},
incomeByHairColor: {
$push: {
_id: "$_id.hairColor", counts: "$count", avgIncome: "$avgIncome"
}
}
}
},
{
$project: {
_id: 1,
count: 1,
avgShoeSize: { $divide: [ "$sumShoeSize", "$count" ] },
shoeSizeByHairColor: 1,
incomeByHairColor: 1
}
}
])
Mongo Playground
Phase 1: You can group by name and hairColor
and accumulate count, avgShoeSize, avgIncome, hairColors
Phase 2: Push accumulated into an array of incomeByHairColor, incomeByHairColor using $map operator.
Phase 3: Finally, in phase 3 you accumulate group by name and accumulate,
incomeByHairColor, incomeByHairColor and count
Pipeline:
db.users.aggregate([
{
$group :{
_id: {
name : "$name",
hairColor: "$hairColor"
},
count : {"$sum": 1},
avgShoeSize: {$avg: "$shoeSize"},
avgIncome : {$avg: "$income"},
hairColors : {$addToSet:"$hairColor" }
}
},
{
$project: {
_id:0,
name : "$_id.name",
hairColor: "$_id.hairColor",
count : "$count",
incomeByHairColor : {
$map: {
input: "$hairColors",
as: "key",
in: {
_id: "$$key",
counts: "$count",
avgIncome: "$avgIncome"
}
}
},
shoeSizeByHairColor:{
$map: {
input: "$hairColors",
as: "key",
in: {
_id: "$$key",
counts: "$count",
avgShoeSize: "$avgShoeSize"
}
}
}
}
},
{
$group: {
_id : "$name",
count : {$sum: "$count"},
incomeByHairColor: {$push : "$incomeByHairColor"},
shoeSizeByHairColor : {$push : "$shoeSizeByHairColor"}
}
}
]
)
Output:
/* 1 */
{
"_id" : "Clint",
"count" : 2,
"incomeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 1,
"avgIncome" : 30000
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgIncome" : 20000
}
]
],
"shoeSizeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 1,
"avgShoeSize" : 9
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgShoeSize" : 8
}
]
]
},
/* 2 */
{
"_id" : "George",
"count" : 3,
"incomeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 2,
"avgIncome" : 15000
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgIncome" : 30000
}
]
],
"shoeSizeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 2,
"avgShoeSize" : 8.5
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgShoeSize" : 7
}
]
]
}

Mongo shell query for survey stats($unwind with 2D array)

My Document Structure(Only 2 given just for the idea):
/* 1 */
{
"_id" : ObjectId("59edc58af33e9b5988b875fa"),
"Agent" : {
"Name" : "NomanAgent",
"Location" : "Lahore",
"AgentId" : 66,
"Suggestion" : [
"Knowledge",
"Professionalisn"
]
},
"Rating" : 2,
"Status" : "Submitted"
}
/* 2 */
{
"_id" : ObjectId("59edc58af33e9b5988b875fb"),
"Agent" : {
"Name" : "NomanAgent",
"Location" : "Lahore",
"AgentId" : 66,
"Suggestion" : [
"Knowledge",
"Clarity"
]
},
"Rating" : 1,
"Status" : "Submitted"
}
/* 3 */
{
"_id" : ObjectId("59edc58af33e9b5988b875fc"),
"Agent" : {
"Name" : "NomanAgent",
"Location" : "Lahore",
"AgentId" : 66,
"Reward" : "Thumb Up"
},
"Rating" : 5,
"Status" : "Submitted"
}
These are basically the survey responses, so an Agent object could contain either a Suggestion(in case of bad customer review) or a Reward(in case of a happy customer) so here I am showing 2 documents with Suggestions and 1 with Reward.
I have created a query for the Rewards which is given below,
db.getCollection('_survey.response').aggregate([
{
$group:{
_id: "$Agent.Name",
Rating: {$avg: "$Rating"},
Rewards: {$push: "$Agent.Reward"},
Status: {$push : "$Status"}
}
},
{
$unwind: "$Rewards"
},
{
$group:{
_id: {
Agent: "$_id",
Rating: "$Rating",
Rewards: "$Rewards"
},
RewardCount:{$sum: 1},
SurveyStatus: {$first: "$Status"}
}
},
{
$group:{
_id: "$_id.Agent",
Rewards: {$push:{Reward: "$_id.Rewards", Count: "$RewardCount"}},
Rating: {$first: "$_id.Rating"},
SurveyStatus: {$first: "$SurveyStatus"}
}
},
{
$unwind: "$SurveyStatus"
},
{
$group:{
_id: {
Agent: "$_id",
Survey: "$SurveyStatus"
},
StatusCount:{$sum : 1},
Rating: {$first: "$Rating"},
Rewards: {$first: "$Rewards"}
}
},
{
$group:{
_id: "$_id.Agent",
Status:{$push:{Status: "$_id.Survey", Count: "$StatusCount"}},
Rewards: {$first: "$Rewards"},
Rating: {$first: "$Rating"}
}
},
{
$project:{
_id: 0,
Agent: "$_id",
Rating: {
$multiply:[
{$divide:["$Rating",5]},
100
]
},
Status: 1,
Rewards: 1
}
}
]);
Above query works perfectly fine for the rewards, i want exactly the same thing for suggestions and i would be happy if its possible to adjust Suggestions in the same query(We can also create a separate query for suggestion).
Response of above given query:
/* 1 */
{
"Status" : [
{
"Status" : "Submitted",
"Count" : 2.0
},
{
"Status" : "Pending",
"Count" : 1.0
},
{
"Status" : "Opened",
"Count" : 2.0
}
],
"Rewards" : [
{
"Reward" : "Thumb Up",
"Count" : 1.0
},
{
"Reward" : "Thank You",
"Count" : 2.0
}
],
"Agent" : "GhazanferAgent",
"Rating" : 68.0
}
/* 2 */
{
"Status" : [
{
"Status" : "Opened",
"Count" : 2.0
},
{
"Status" : "Viewed",
"Count" : 2.0
},
{
"Status" : "Pending",
"Count" : 3.0
}
],
"Rewards" : [
{
"Reward" : "Gift",
"Count" : 1.0
},
{
"Reward" : "Thumb Up",
"Count" : 3.0
},
{
"Reward" : "Thank You",
"Count" : 1.0
}
],
"Agent" : "NomanAgent",
"Rating" : 60.0
}
What I have tried so far, I think of two approaches but find an issue with each of them,
First(Find avg rating and push status and suggestions in array):
db.getCollection("_survey.response").aggregate([
{
$match:
{
$and:[
{
"Agent.Suggestion":{
$exists: true
}
},
{
Rating: {$lte: 3}
}
]
}
},
{
$group:{
_id: {
AgentName: "$Agent.Name",
AgentId: "$Agent.AgentId",
Location: "$Agent.Location"
},
Rating: {$avg: "$Rating"},
Status: {$push : "$Status"},
Suggestions: {$push: "$Agent.Suggestion"}
}
}
]);
Issue facing with this approach is, suggestions in the projection will become an array of arrays(as it was initially an array) of dynamic size depending on the number of times an agent gets a suggestion in a customer response. So the problem is applying $unwind on 2D array of dynamic size.
Second($unwind the suggestions in the first stage as its a 1D array
to avoid $unwind issue on 2D array of dynamic size)
db.getCollection("_survey.response").aggregate([
{
$match:
{
$and:[
{
"Agent.Suggestion":{
$exists: true
}
},
{
Rating: {$lte: 3}
}
]
}
},
{
$unwind: "$Agent.Suggestion"
},
{
$group: {
_id:{
AgentName: "$Agent.Name",
AgentId: "$Agent.AgentId",
Suggestion: "$Agent.Suggestion",
Location: "$Agent.Location"
},
Status: {$push: "$Status"},
Rating: {$avg: "$Rating"},
Count: {$sum: 1}
}
}
]);
Problem using this approach is $unwind Suggestion array it will flatten all suggestion with their respective agents thus increasing the number of documents(as compared to original responses) so i won't be able to find correct value for average rating for each agent on the basis of this grouping and the same will happen the Status(Because i can correctly find these two fields only if i group by agent. While, here i am grouping with agent along with suggestion),
I want exactly the same response for Suggestion query, only the Rewards object in response would replace Suggestions(Or it would great if we could get Suggestions object in the same response)
Survey Status can be, pending, Opened,viewed, Submitted etc
Output explanation:
I want suggestions(with counts), status(with counts) and Rating in % form(which i am already doing) for each of the agent as you can see in the output mentioned above.
Thanks in advance!!
Using $unwind two consecutive times did the trick for me, using First approach,
db.getCollection("_survey.response").aggregate([
{
$match:
{
$and:[
{
"Agent.Suggestion":{
$exists: true
}
},
{
Rating: {$lte: 3}
}
]
}
},
{
$group:{
_id: {
AgentName: "$Agent.Name",
AgentId: "$Agent.AgentId",
Location: "$Agent.Location"
},
Rating: {$avg: "$Rating"},
Status: {$push : "$Status"},
Suggestions: {$push: "$Agent.Suggestion"}
}
},
{
$unwind: "$Suggestions"
},
{
$unwind: "$Suggestions"
},
{
$group: {
_id: {
Suggestions: "$Suggestions",
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location"
},
SuggestionCount: {$sum: 1},
Rating: {$first: "$Rating"},
Status: {$first: "$Status"}
}
},
{
$group: {
_id:{
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location"
},
Suggestions: {$push:{Sugestion: "$_id.Suggestions", Count: "$SuggestionCount"}},
TotalSuggestions: {$sum: "$SuggestionCount"},
Rating: {$first: "$Rating"},
Status: {$first: "$Status"}
}
},
{
$unwind: "$Status"
},
{
$group:{
_id: {
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location",
Status: "$Status"
},
StatusCount:{$sum : 1},
Rating: {$first: "$Rating"},
Suggestions: {$first: "$Suggestions"},
TotalSuggestions: {$first: "$TotalSuggestions"}
}
},
{
$group:{
_id: {
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location"
},
Status:{$push:{Status: "$_id.Status", Count: "$StatusCount"}},
TotalStatus: {$sum: "$StatusCount"},
Suggestions: {$first: "$Suggestions"},
TotalSuggestions: {$first: "$TotalSuggestions"},
Rating: {$first: "$Rating"}
}
},
{
$project: {
_id: 0,
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location",
Status: 1,
TotalStatus: 1,
Suggestions: 1,
TotalSuggestions: 1,
Performance: {
$concat: [
{
$substr: [
{
$multiply:[
{$divide:["$Rating",5]},
100
]
}, 0, 4
]
},"%"
]
}
}
}
]);

MongoDB: Project to array item with minimum value of field

Suppose my collection consists of items that looks like this:
{
"items" : [
{
"item_id": 1,
"item_field": 10
},
{
"item_id": 2,
"item_field": 15
},
{
"item_id": 3,
"item_field": 3
},
]
}
Can I somehow select the entry of items with the lowest value of item_field, in this case the one with item_id 3?
I'm ok with using the aggregation framework. Bonus point if you can give me the code for the C# driver.
You can use $reduce expression in the following way.
The below query will set the initialValue to the first element of $items.item_field and followed by $lt comparison on the item_field and if true set $$this to $$value, if false keep the previous value and $reduce all the values to find the minimum element and $project to output min item.
db.collection.aggregate([
{
$project: {
items: {
$reduce: {
input: "$items",
initialValue:{
item_field:{
$let: {
vars: { obj: { $arrayElemAt: ["$items", 0] } },
in: "$$obj.item_field"
}
}
},
in: {
$cond: [{ $lt: ["$$this.item_field", "$$value.item_field"] }, "$$this", "$$value" ]
}
}
}
}
}
])
You can use $unwind to seperate items entries.
Then $sort by item_field asc and then $group.
db.coll.find().pretty()
{
"_id" : ObjectId("58edec875748bae2cc391722"),
"items" : [
{
"item_id" : 1,
"item_field" : 10
},
{
"item_id" : 2,
"item_field" : 15
},
{
"item_id" : 3,
"item_field" : 3
}
]
}
db.coll.aggregate([
{$unwind: {path: '$items', includeArrayIndex: 'index'}},
{$sort: { 'items.item_field': 1}},
{$group: {_id: '$_id', item: {$first: '$items'}}}
])
{ "_id" : ObjectId("58edec875748bae2cc391722"), "item" : { "item_id" : 3, "item_field" : 3 } }
We can get expected result using following query
db.testing.aggregate([{$unwind:"$items"}, {$sort: { 'items.item_field': 1}},{$group: {_id: "$_id", minItem: {$first: '$items'}}}])
Result is
{ "_id" : ObjectId("58edf28c73fed29f4b741731"), "minItem" : { "item_id" : 3, "item_field" : 3 } }
{ "_id" : ObjectId("58edec3373fed29f4b741730"), "minItem" : { "item_id" : 3, "item_field" : 3 } }

Finding all documents which share the same value in an array

Consider I have the following data below:
{
"id":123,
"name":"apple",
"codes":["ABC", "DEF", "EFG"]
}
{
"id":234,
"name":"pineapple",
"codes":["DEF"]
}
{
"id":345,
"name":"banana",
"codes":["HIJ","KLM"]
}
If I didn't want to search by a specific code, is there a way to find all fruits in my mongodb collection which shares the same code?
db.collection.aggregate([
{ $unwind: '$codes' },
{ $group: { _id: '$codes', count: {$sum:1}, fruits: {$push: '$name'}}},
{ $match: {'count': {$gt:1}}},
{ $group:{_id:null, total:{$sum:1}, data:{$push:{fruits: '$fruits', code:'$_id'}}}}
])
result:
{ "_id" : null, "total" : 1, "data" : [ { "fruits" : [ "apple", "pineapple" ], "code" : "DEF" } ] }

Querying the total number of elements in nested arrays - embed documents MongoDB

I have documents in my collections like to:
{
_id: 1,
activities: [
{
activity_id: 1,
travel: [
{
point_id: 1,
location: [-76.0,19.1]
},
{
point_id: 2,
location: [-77.0,19.3]
}
]
},
{
activity_id: 2,
travel: [
{
point_id: 3,
location: [-99.3,18.2]
}
]
}
]
},
{
_id: 2,
activities: [
{
activity_id: 3,
travel: [
{
point_id: 4,
location: [-75.0,11.1]
}
]
}
]
}
I can get the total number of activities, as follows:
db.mycollection.aggregate(
{$unwind: "$activities"},
{$project: {count:{$add:1}}},
{$group: {_id: null, number: {$sum: "$count" }}}
)
I get (3 activities):
{ "result" : [ { "_id" : null, "number" : 3 } ], "ok" : 1 }
question: How can I get the total number of elements in all travels?
expected result: 4 elements
these are:
{
point_id: 1,
location: [-76.0,19.1]
},
{
point_id: 2,
location: [-77.0,19.3]
},
{
point_id: 3,
location: [-99.3,18.2]
},
{
point_id: 4,
location: [-75.0,11.1]
}
You can easily transform document by using double $unwind
e.g.
db.collection.aggregate([
{$unwind: "$activities"},
{$unwind: "$activities.travel"},
{$group:{
_id:null,
travel: {$push: {
point_id:"$activities.travel.point_id",
location:"$activities.travel.location"}}
}},
{$project:{_id:0, travel:"$travel"}}
])
This will emit which is very close to your desired output format:
{
"travel" : [
{
"point_id" : 1.0,
"location" : [
-76.0,
19.1
]
},
{
"point_id" : 2.0,
"location" : [
-77.0,
19.3
]
},
{
"point_id" : 3.0,
"location" : [
-99.3,
18.2
]
},
{
"point_id" : 4.0,
"location" : [
-75.0,
11.1
]
}
]
}
Update:
If you just want to know total number of travel documents in whole collection,
try:
db.collection.aggregate([
{$unwind: "$activities"},
{$unwind: "$activities.travel"},
{$group: {_id:0, total:{$sum:1}}}
])
It will print:
{
"_id" : NumberInt(0),
"total" : NumberInt(4)
}
Update 2:
OP wants to filter documents based on some property in aggregation framework. Here is a way to do so:
db.collection.aggregate([
{$unwind: "$activities"},
{$match:{"activities.activity_id":1}},
{$unwind: "$activities.travel"},
{$group: {_id:0, total:{$sum:1}}}
])
It will print (based on sample document):
{ "_id" : 0, "total" : 2 }