MongoDB - get max from result of avg aggregation - mongodb

I have collection of products and these products have assessments. I need select product with the highest average of assessment. The problem is I can group products by average but I cannot group by average and select product with highest average.
To reproduce my problem follow these steps:
Insert products:
db.products.insert([
{
name: "Product1",
price: 1000,
features: {
feature1: 0.8,
feature2: 23
},
tags: ["tag1", "tag2", "tag3", "tag4"],
assessments: [
{name: "John", assessment: 3},
{name: "Anna", assessment: 4},
{name: "Kyle", assessment: 3.6}
]
},
{
name: "Product2",
price: 1200,
features: {
feature1: 4,
feature2: 4000,
feature3: "SDS"
},
tags: ["tag1"],
assessments: [
{name: "John", assessment: 5},
{name: "Richard", assessment: 4.8}
]
},
{
name: "Product3",
price: 450,
features: {
feature1: 1.3,
feature2: 60
},
tags: ["tag1", "tag2"],
assessments: [
{name: "Anna", assessment: 5},
{name: "Robert", assessment: 4},
{name: "John", assessment: 4},
{name: "Julia", assessment: 3}
]
},
{
name: "Product4",
price: 900,
features: {
feature1: 1700,
feature2: 17
},
tags: ["tag1", "tag2", "tag3"],
assessments: [
{name: "Monica", assessment: 3},
{name: "Carl", assessment: 4}
]
}
])
And I want to group by avg of assessments and select product with max avg.
I do it following:
db.products.aggregate([
{ $unwind : "$assessments" },
{ $group:
{
_id: "$name",
avg_assessment: {$avg: "$assessments.assessment"}
}
},
{ $project:
{
_id: 0,
product: "$_id",
avg_assessment: 1
}
}
])
Result of this query is:
{ "avg_assessment" : 3.5, "product" : "Product4" }
{ "avg_assessment" : 4, "product" : "Product3" }
{ "avg_assessment" : 4.9, "product" : "Product2" }
{ "avg_assessment" : 3.533333333333333, "product" : "Product1" }
Nice. Then I try to select product with highest avg using following query:
db.products.aggregate([
{ $unwind : "$assessments" },
{ $group:
{
_id: "$name",
avg_assessment: { $max: {$avg: "$assessments.assessment"}}
}
},
{ $project:
{
_id: 0,
product: "$_id",
avg_assessment: 1
}
}
])
But the result is the same but with rounded up values:
{ "avg_assessment" : 4, "product" : "Product4" }
{ "avg_assessment" : 5, "product" : "Product3" }
{ "avg_assessment" : 5, "product" : "Product2" }
{ "avg_assessment" : 4, "product" : "Product1" }
What's going on? Where is a problem?

You can try below aggregation. No $unwind needed here.
Compute $avg for each assessment followed by sort desc.
$group with $first to pick the assessment with highest avg value.
Add $project stage to limit the fields.
db.products.aggregate([
{ "$addFields" : {"avg_assessment":{"$avg":"$assessments.assessment" }}},
{ "$sort":{"avg_assessment":-1}},
{ "$group":
{
"_id": null,
"highest_avg_assessment": { $first:"$$ROOT"}
}
}
])

This might help:
db.products.aggregate([
{ $unwind : "$assessments" },
{ $group:
{
_id: "$name",
avg_assessment: {$avg: "$assessments.assessment"}
}
},
{
$sort: { avg_assessment: -1 } // sort by avg_assessment descending
},
{
$limit: 1 // only return one document
}
])

Related

Apply multistage grouping in MongoDb Aggregation Framework

lets's assume I have the following data:
[
{ name: "Clint", hairColor: "brown", shoeSize: 8, income: 20000 },
{ name: "Clint", hairColor: "blond", shoeSize: 9, income: 30000 },
{ name: "George", hairColor: "brown", shoeSize: 7, income: 30000 },
{ name: "George", hairColor: "blond", shoeSize: 8, income: 10000 },
{ name: "George", hairColor: "blond", shoeSize: 9, income: 20000 }
]
I want to have the following output:
[
{
name: "Clint",
counts: 2,
avgShoesize: 8.5,
shoeSizeByHairColor: [
{ _id: "brown", counts: 1, avgShoesize: 8 },
{ _id: "blond", counts: 1, avgShoesize: 9 },
],
incomeByHairColor: [
{ _id: "brown", counts: 1, avgIncome: 20000 },
{ _id: "blond", counts: 1, avgIncome: 30000 },
]
},
{
name: "George",
counts: 3,
avgShoesize: 8,
shoeSizeByHairColor: [
{ _id: "brown", counts: 1, avgShoesize: 8 },
{ _id: "blond", counts: 2, avgShoesize: 8.5 },
],
incomeByHairColor: [
{ _id: "brown", counts: 1, avgIncome: 30000 },
{ _id: "blond", counts: 2, avgIncome: 15000 },
],
}
]
Basically I want to group my dataset by some key and then I want to have multiple groups of the subset.
First I thought of applying a $group with the key name. and the to use $facet in order to have various aggregations. I guess this will ot work since $facet does not use the subset from the previous $group. If I use $facet first I would need to split the result in multiple documents.
Any ideas how to properly solve my problem?
You need double $group, first one should aggregate by name and hairColor. And the second one can build nested array:
db.collection.aggregate([
{
$group: {
_id: { name: "$name", hairColor: "$hairColor" },
count: { $sum: 1 },
sumShoeSize: { $sum: "$shoeSize" },
avgShoeSize: { $avg: "$shoeSize" },
avgIncome: { $avg: "$income" },
docs: { $push: "$$ROOT" }
}
},
{
$group: {
_id: "$_id.name",
count: { $sum: "$count" },
sumShoeSize: { $sum: "$sumShoeSize" },
shoeSizeByHairColor: {
$push: {
_id: "$_id.hairColor", counts: "$count", avgShoeSize: "$avgShoeSize"
}
},
incomeByHairColor: {
$push: {
_id: "$_id.hairColor", counts: "$count", avgIncome: "$avgIncome"
}
}
}
},
{
$project: {
_id: 1,
count: 1,
avgShoeSize: { $divide: [ "$sumShoeSize", "$count" ] },
shoeSizeByHairColor: 1,
incomeByHairColor: 1
}
}
])
Mongo Playground
Phase 1: You can group by name and hairColor
and accumulate count, avgShoeSize, avgIncome, hairColors
Phase 2: Push accumulated into an array of incomeByHairColor, incomeByHairColor using $map operator.
Phase 3: Finally, in phase 3 you accumulate group by name and accumulate,
incomeByHairColor, incomeByHairColor and count
Pipeline:
db.users.aggregate([
{
$group :{
_id: {
name : "$name",
hairColor: "$hairColor"
},
count : {"$sum": 1},
avgShoeSize: {$avg: "$shoeSize"},
avgIncome : {$avg: "$income"},
hairColors : {$addToSet:"$hairColor" }
}
},
{
$project: {
_id:0,
name : "$_id.name",
hairColor: "$_id.hairColor",
count : "$count",
incomeByHairColor : {
$map: {
input: "$hairColors",
as: "key",
in: {
_id: "$$key",
counts: "$count",
avgIncome: "$avgIncome"
}
}
},
shoeSizeByHairColor:{
$map: {
input: "$hairColors",
as: "key",
in: {
_id: "$$key",
counts: "$count",
avgShoeSize: "$avgShoeSize"
}
}
}
}
},
{
$group: {
_id : "$name",
count : {$sum: "$count"},
incomeByHairColor: {$push : "$incomeByHairColor"},
shoeSizeByHairColor : {$push : "$shoeSizeByHairColor"}
}
}
]
)
Output:
/* 1 */
{
"_id" : "Clint",
"count" : 2,
"incomeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 1,
"avgIncome" : 30000
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgIncome" : 20000
}
]
],
"shoeSizeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 1,
"avgShoeSize" : 9
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgShoeSize" : 8
}
]
]
},
/* 2 */
{
"_id" : "George",
"count" : 3,
"incomeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 2,
"avgIncome" : 15000
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgIncome" : 30000
}
]
],
"shoeSizeByHairColor" : [
[
{
"_id" : "blond",
"counts" : 2,
"avgShoeSize" : 8.5
}
],
[
{
"_id" : "brown",
"counts" : 1,
"avgShoeSize" : 7
}
]
]
}

MongoDB Aggregate how to pair relevant records for processing

I've got some event data captured in a MongoDB database, and some of these events occur in pairs.
Eg: DOOR_OPEN and DOOR_CLOSE are two events that occur in pairs
Events collection:
{ _id: 1, name: "DOOR_OPEN", userID: "user1", timestamp: t }
{ _id: 2, name: "DOOR_OPEN", userID: "user2", timestamp: t+5 }
{ _id: 3, name: "DOOR_CLOSE", userID: "user1", timestamp:t+10 }
{ _id: 4, name: "DOOR_OPEN", userID: "user1", timestamp:t+30 }
{ _id: 5, name: "SOME_OTHER_EVENT", userID: "user3", timestamp:t+35 }
{ _id: 6, name: "DOOR_CLOSE", userID: "user2", timestamp:t+40 }
...
Assuming the records are sorted on the timestamp, the _id: 1 and _id: 3 are a "pair" for "user1. _id: 2 and _id: 6 for "user2".
I'd like to take all these DOOR_OPEN & DOOR_CLOSE pairs per user and calculate the average duration etc. the door has been opened by each user.
Can this be achieved using the aggregate framework?
You can use $lookup and $group for achieving this.
db.getCollection('TestColl').aggregate([
{ $match: {"name": { $in: [ "DOOR_OPEN", "DOOR_CLOSE" ] } }},
{ $lookup:
{
from: "TestColl",
let: { userID_lu: "$userID", name_lu: "$name", timestamp_lu :"$timestamp" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$userID", "$$userID_lu" ] },
{ $eq: [ "$$name_lu", "DOOR_OPEN" ]},
{ $eq: [ "$name", "DOOR_CLOSE" ]},
{ $gt: [ "$timestamp", "$$timestamp_lu" ] }
]
}
}
},
],
as: "close_dates"
}
},
{ $addFields: { "close_time": { $arrayElemAt: [ "$close_dates.timestamp", 0 ] } } },
{ $addFields: { "time_diff": { $divide: [ { $subtract: [ "$close_time", "$timestamp" ] }, 1000 * 60 ]} } }, // Minutes
{ $group: { _id: "$userID" ,
events: { $push: { "eventId": "$_id", "name": "$name", "timestamp": "$timestamp" } },
averageTimestamp: {$avg: "$time_diff"}
}
}
])
Sample Data:
[
{ _id: 1, name: "DOOR_OPEN", userID: "user1", timestamp: ISODate("2019-10-24T08:00:00Z") },
{ _id: 2, name: "DOOR_OPEN", userID: "user2", timestamp: ISODate("2019-10-24T08:05:00Z") },
{ _id: 3, name: "DOOR_CLOSE", userID: "user1", timestamp:ISODate("2019-10-24T08:10:00Z") },
{ _id: 4, name: "DOOR_OPEN", userID: "user1", timestamp:ISODate("2019-10-24T08:30:00Z") },
{ _id: 5, name: "SOME_OTHER_EVENT", userID: "user3", timestamp:ISODate("2019-10-24T08:35:00Z") },
{ _id: 6, name: "DOOR_CLOSE", userID: "user2", timestamp:ISODate("2019-10-24T08:40:00Z") },
{ _id: 7, name: "DOOR_CLOSE", userID: "user1", timestamp:ISODate("2019-10-24T08:50:00Z") },
{ _id: 8, name: "DOOR_OPEN", userID: "user2", timestamp:ISODate("2019-10-24T08:55:00Z") }
]
Result:
/* 1 */
{
"_id" : "user2",
"events" : [
{
"eventId" : 2.0,
"name" : "DOOR_OPEN",
"timestamp" : ISODate("2019-10-24T08:05:00.000Z")
},
{
"eventId" : 6.0,
"name" : "DOOR_CLOSE",
"timestamp" : ISODate("2019-10-24T08:40:00.000Z")
},
{
"eventId" : 8.0,
"name" : "DOOR_OPEN",
"timestamp" : ISODate("2019-10-24T08:55:00.000Z")
}
],
"averageTimestamp" : 35.0
}
/* 2 */
{
"_id" : "user1",
"events" : [
{
"eventId" : 1.0,
"name" : "DOOR_OPEN",
"timestamp" : ISODate("2019-10-24T08:00:00.000Z")
},
{
"eventId" : 3.0,
"name" : "DOOR_CLOSE",
"timestamp" : ISODate("2019-10-24T08:10:00.000Z")
},
{
"eventId" : 4.0,
"name" : "DOOR_OPEN",
"timestamp" : ISODate("2019-10-24T08:30:00.000Z")
},
{
"eventId" : 7.0,
"name" : "DOOR_CLOSE",
"timestamp" : ISODate("2019-10-24T08:50:00.000Z")
}
],
"averageTimestamp" : 15.0
}
You could use the $group operator of the aggregate framework to group by userID and calculate the averages:
db.events.aggregate([{
$group: {
_id: "$userID",
averageTimestamp: {$avg: "$timestamp"}
}
}]);
If you also want to discard any other event other than DOOR_OPEN or DOOR_CLOSED, you can add a filter adding a $match in the aggregate pipeline:
db.events.aggregate([{
$match: {
$or: [{name: "DOOR_OPEN"},{name: "DOOR_CLOSE"}]
}
}, {
$group: {
_id: "$userID",
averageTimestamp: {$avg: "$timestamp"}
}
}]);

Mongo shell query for survey stats($unwind with 2D array)

My Document Structure(Only 2 given just for the idea):
/* 1 */
{
"_id" : ObjectId("59edc58af33e9b5988b875fa"),
"Agent" : {
"Name" : "NomanAgent",
"Location" : "Lahore",
"AgentId" : 66,
"Suggestion" : [
"Knowledge",
"Professionalisn"
]
},
"Rating" : 2,
"Status" : "Submitted"
}
/* 2 */
{
"_id" : ObjectId("59edc58af33e9b5988b875fb"),
"Agent" : {
"Name" : "NomanAgent",
"Location" : "Lahore",
"AgentId" : 66,
"Suggestion" : [
"Knowledge",
"Clarity"
]
},
"Rating" : 1,
"Status" : "Submitted"
}
/* 3 */
{
"_id" : ObjectId("59edc58af33e9b5988b875fc"),
"Agent" : {
"Name" : "NomanAgent",
"Location" : "Lahore",
"AgentId" : 66,
"Reward" : "Thumb Up"
},
"Rating" : 5,
"Status" : "Submitted"
}
These are basically the survey responses, so an Agent object could contain either a Suggestion(in case of bad customer review) or a Reward(in case of a happy customer) so here I am showing 2 documents with Suggestions and 1 with Reward.
I have created a query for the Rewards which is given below,
db.getCollection('_survey.response').aggregate([
{
$group:{
_id: "$Agent.Name",
Rating: {$avg: "$Rating"},
Rewards: {$push: "$Agent.Reward"},
Status: {$push : "$Status"}
}
},
{
$unwind: "$Rewards"
},
{
$group:{
_id: {
Agent: "$_id",
Rating: "$Rating",
Rewards: "$Rewards"
},
RewardCount:{$sum: 1},
SurveyStatus: {$first: "$Status"}
}
},
{
$group:{
_id: "$_id.Agent",
Rewards: {$push:{Reward: "$_id.Rewards", Count: "$RewardCount"}},
Rating: {$first: "$_id.Rating"},
SurveyStatus: {$first: "$SurveyStatus"}
}
},
{
$unwind: "$SurveyStatus"
},
{
$group:{
_id: {
Agent: "$_id",
Survey: "$SurveyStatus"
},
StatusCount:{$sum : 1},
Rating: {$first: "$Rating"},
Rewards: {$first: "$Rewards"}
}
},
{
$group:{
_id: "$_id.Agent",
Status:{$push:{Status: "$_id.Survey", Count: "$StatusCount"}},
Rewards: {$first: "$Rewards"},
Rating: {$first: "$Rating"}
}
},
{
$project:{
_id: 0,
Agent: "$_id",
Rating: {
$multiply:[
{$divide:["$Rating",5]},
100
]
},
Status: 1,
Rewards: 1
}
}
]);
Above query works perfectly fine for the rewards, i want exactly the same thing for suggestions and i would be happy if its possible to adjust Suggestions in the same query(We can also create a separate query for suggestion).
Response of above given query:
/* 1 */
{
"Status" : [
{
"Status" : "Submitted",
"Count" : 2.0
},
{
"Status" : "Pending",
"Count" : 1.0
},
{
"Status" : "Opened",
"Count" : 2.0
}
],
"Rewards" : [
{
"Reward" : "Thumb Up",
"Count" : 1.0
},
{
"Reward" : "Thank You",
"Count" : 2.0
}
],
"Agent" : "GhazanferAgent",
"Rating" : 68.0
}
/* 2 */
{
"Status" : [
{
"Status" : "Opened",
"Count" : 2.0
},
{
"Status" : "Viewed",
"Count" : 2.0
},
{
"Status" : "Pending",
"Count" : 3.0
}
],
"Rewards" : [
{
"Reward" : "Gift",
"Count" : 1.0
},
{
"Reward" : "Thumb Up",
"Count" : 3.0
},
{
"Reward" : "Thank You",
"Count" : 1.0
}
],
"Agent" : "NomanAgent",
"Rating" : 60.0
}
What I have tried so far, I think of two approaches but find an issue with each of them,
First(Find avg rating and push status and suggestions in array):
db.getCollection("_survey.response").aggregate([
{
$match:
{
$and:[
{
"Agent.Suggestion":{
$exists: true
}
},
{
Rating: {$lte: 3}
}
]
}
},
{
$group:{
_id: {
AgentName: "$Agent.Name",
AgentId: "$Agent.AgentId",
Location: "$Agent.Location"
},
Rating: {$avg: "$Rating"},
Status: {$push : "$Status"},
Suggestions: {$push: "$Agent.Suggestion"}
}
}
]);
Issue facing with this approach is, suggestions in the projection will become an array of arrays(as it was initially an array) of dynamic size depending on the number of times an agent gets a suggestion in a customer response. So the problem is applying $unwind on 2D array of dynamic size.
Second($unwind the suggestions in the first stage as its a 1D array
to avoid $unwind issue on 2D array of dynamic size)
db.getCollection("_survey.response").aggregate([
{
$match:
{
$and:[
{
"Agent.Suggestion":{
$exists: true
}
},
{
Rating: {$lte: 3}
}
]
}
},
{
$unwind: "$Agent.Suggestion"
},
{
$group: {
_id:{
AgentName: "$Agent.Name",
AgentId: "$Agent.AgentId",
Suggestion: "$Agent.Suggestion",
Location: "$Agent.Location"
},
Status: {$push: "$Status"},
Rating: {$avg: "$Rating"},
Count: {$sum: 1}
}
}
]);
Problem using this approach is $unwind Suggestion array it will flatten all suggestion with their respective agents thus increasing the number of documents(as compared to original responses) so i won't be able to find correct value for average rating for each agent on the basis of this grouping and the same will happen the Status(Because i can correctly find these two fields only if i group by agent. While, here i am grouping with agent along with suggestion),
I want exactly the same response for Suggestion query, only the Rewards object in response would replace Suggestions(Or it would great if we could get Suggestions object in the same response)
Survey Status can be, pending, Opened,viewed, Submitted etc
Output explanation:
I want suggestions(with counts), status(with counts) and Rating in % form(which i am already doing) for each of the agent as you can see in the output mentioned above.
Thanks in advance!!
Using $unwind two consecutive times did the trick for me, using First approach,
db.getCollection("_survey.response").aggregate([
{
$match:
{
$and:[
{
"Agent.Suggestion":{
$exists: true
}
},
{
Rating: {$lte: 3}
}
]
}
},
{
$group:{
_id: {
AgentName: "$Agent.Name",
AgentId: "$Agent.AgentId",
Location: "$Agent.Location"
},
Rating: {$avg: "$Rating"},
Status: {$push : "$Status"},
Suggestions: {$push: "$Agent.Suggestion"}
}
},
{
$unwind: "$Suggestions"
},
{
$unwind: "$Suggestions"
},
{
$group: {
_id: {
Suggestions: "$Suggestions",
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location"
},
SuggestionCount: {$sum: 1},
Rating: {$first: "$Rating"},
Status: {$first: "$Status"}
}
},
{
$group: {
_id:{
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location"
},
Suggestions: {$push:{Sugestion: "$_id.Suggestions", Count: "$SuggestionCount"}},
TotalSuggestions: {$sum: "$SuggestionCount"},
Rating: {$first: "$Rating"},
Status: {$first: "$Status"}
}
},
{
$unwind: "$Status"
},
{
$group:{
_id: {
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location",
Status: "$Status"
},
StatusCount:{$sum : 1},
Rating: {$first: "$Rating"},
Suggestions: {$first: "$Suggestions"},
TotalSuggestions: {$first: "$TotalSuggestions"}
}
},
{
$group:{
_id: {
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location"
},
Status:{$push:{Status: "$_id.Status", Count: "$StatusCount"}},
TotalStatus: {$sum: "$StatusCount"},
Suggestions: {$first: "$Suggestions"},
TotalSuggestions: {$first: "$TotalSuggestions"},
Rating: {$first: "$Rating"}
}
},
{
$project: {
_id: 0,
AgentName: "$_id.AgentName",
AgentId: "$_id.AgentId",
Location: "$_id.Location",
Status: 1,
TotalStatus: 1,
Suggestions: 1,
TotalSuggestions: 1,
Performance: {
$concat: [
{
$substr: [
{
$multiply:[
{$divide:["$Rating",5]},
100
]
}, 0, 4
]
},"%"
]
}
}
}
]);

Group values by sub string in MongoDB

I have this documents in my collection :
{_id: "aaaaaaaa", email: "mail1#orange.fr"},
{_id: "bbbbbbbb", email: "mail2#orange.fr"},
{_id: "cccccccc", email: "mail3#orange.fr"},
{_id: "dddddddd", email: "mail4#gmail.com"},
{_id: "eeeeeeee", email: "mail5#gmail.com"},
{_id: "ffffffff", email: "mail6#yahoo.com"}
And i would like this result :
{
result: [
{domain: "orange.fr", count: 3},
{domain: "gmail.com", count: 2},
{domain: "yahoo.com", count: 1},
]
}
I'm not sure you can use the aggregator and $regex operator
Aggregation Framework
I don't believe that with the present document structure you can achieve the desired result by using the aggregation framework. If you stored the domain name in a separate field, it would have become trivial:
db.items.aggregate(
{
$group:
{
_id: "$emailDomain",
count: { $sum: 1 }
},
}
)
Map-Reduce
It's possible to implement what you want using a simple map-reduce aggregation. Naturally, the performance will not be good on large collections.
Query
db.emails.mapReduce(
function() {
if (this.email) {
var parts = this.email.split('#');
emit(parts[parts.length - 1], 1);
}
},
function(key, values) {
return Array.sum(values);
},
{
out: { inline: 1 }
}
)
Output
[
{
"_id" : "gmail.com",
"value" : 2
},
{
"_id" : "yahoo.com",
"value" : 1
},
{
"_id" : "orange.fr",
"value" : 3
}
]
Aggregation Framework
MongoDB 3.4(Released Nov 29, 2016) onwords in aggregation framework have many methods
[
{
$project: {
domain: {
$substr: ["$email", {
$indexOfBytes: ["$email", "#"]
}, {
$strLenBytes: "$email"
}]
}
},
{
$group: {
_id: '$domain',
count: {
$sum: 1
}
}
},
{
$sort: {
'count': -1
}
},
{
$group: {
_id: null,
result: {
$push: {
'domain': "$_id",
'count': '$count'
}
}
}
}
]
Results
{
_id: null,
result: [
{domain: "#orange.fr", count: 3},
{domain: "#gmail.com", count: 2},
{domain: "#yahoo.com", count: 1},
]
}

find documents having a specific count of matches array

I've searched high and low but not been able to find what i'm looking for so apologies if this has already been asked.
Consider the following documents
{
_id: 1,
items: [
{
category: "A"
},
{
category: "A"
},
{
category: "B"
},
{
category: "C"
}]
},
{
_id: 2,
items: [
{
category: "A"
},
{
category: "B"
}]
},
{
_id: 3,
items: [
{
category: "A"
},
{
category: "A"
},
{
category: "A"
}]
}
I'd like to be able to find those documents which have more than 1 category "A" item in the items array. So this should find documents 1 and 3.
Is this possible?
Using aggregation
> db.spam.aggregate([
{$unwind: "$items"},
{$match: {"items.category" :"A"}},
{$group: {
_id: "$_id",
item: {$push: "$items.category"}, count: {$sum: 1}}
},
{$match: {count: {$gt: 1}}}
])
Output
{ "_id" : 3, "item" : [ "A", "A", "A" ], "count" : 3 }
{ "_id" : 1, "item" : [ "A", "A" ], "count" : 2 }