I have a MongoDB database with the following document structure:
{
"name": "ServiceA",
"areas": ["X", "Y", "Z"],
"tags": [
{
"name": "Financial",
"type": "A"
},
{
"name": "Consumer",
"type": "B"
}
]
}
There's many entries each with the same structure. Containing the same areas.
There's many predefined tag names, sorted into a few types.
The aim is to group by area and then count the number of occurrences of each tag. So an output like this:
{
"area": "X",
"count": 100, // Total entries with X as an area
"tagNameCount": {
"Financial": 20,
"Consumer": 10,
...
},
"tagTypeCount": {
"A": 70,,
"B: 40
}
}
I've been starting of using $unwind on areas, but it's the next steps from there I'm stuck on. I get that I need to use $group, but I can't work out how to count occurrences.
You may use $facet operator which allows perform several aggregation in one.
Walkthrough
1. We $unwind by area and tags
2. With $facet, we perform 3 parallel aggregations:
2.1 We count unique areas
2.2 We count unique tag names for each area
2.3 We count unique tag type for each area
3. We join 2 parallel arrays by flatten areas
4. We assemble desired output
db.collection.aggregate([
{
$unwind: "$areas"
},
{
$unwind: "$tags"
},
{
$facet: {
areas: [
{
$group: {
_id: "$areas",
count: {
$addToSet: "$_id"
}
}
},
{
$project: {
_id: 0,
area: "$_id",
count: {
$size: "$count"
}
}
}
],
tagNameCount: [
{
$group: {
_id: {
name: "$tags.name",
areas: "$areas"
},
count: {
$addToSet: "$_id"
}
}
},
{
$group: {
_id: "$_id.areas",
tagNameCount: {
$push: {
k: "$_id.name",
v: {
$size: "$count"
}
}
}
}
},
{
$addFields: {
tagNameCount: {
$arrayToObject: "$tagNameCount"
}
}
}
],
tagTypeCount: [
{
$group: {
_id: {
type: "$tags.type",
areas: "$areas"
},
count: {
$addToSet: "$_id"
}
}
},
{
$group: {
_id: "$_id.areas",
tagTypeCount: {
$push: {
k: "$_id.type",
v: {
$size: "$count"
}
}
}
}
},
{
$addFields: {
tagTypeCount: {
$arrayToObject: "$tagTypeCount"
}
}
}
]
}
},
{
$unwind: "$areas"
},
{
$addFields: {
"tagNameCount": {
$filter: {
input: "$tagNameCount",
cond: {
$eq: [
"$areas.area",
"$$this._id"
]
}
}
},
"tagTypeCount": {
$filter: {
input: "$tagTypeCount",
cond: {
$eq: [
"$areas.area",
"$$this._id"
]
}
}
}
}
},
{
$project: {
area: "$areas.area",
count: "$areas.count",
tagNameCount: {
$arrayElemAt: [
"$tagNameCount.tagNameCount",
0
]
},
tagTypeCount: {
$arrayElemAt: [
"$tagTypeCount.tagTypeCount",
0
]
}
}
},
{
$sort: {
area: 1
}
}
])
MongoPlayground
Here's one method:
unwind both areas and tags
for each area collect the applicable tags, and the unique names and types
count the names to get the total number of tags
for each unique name, count the matching values in the tags
do the same for each unique type
project out the unique fields
db.collection.aggregate([
{$unwind: "$areas"},
{$unwind: "$tags"},
{$group: {
_id: "$areas",
names: {$push: "$tags.name"},
uniqueNames: {$addToSet: "$tags.name"},
types: {$push: "$tags.type"},
uniqueTypes: {$addToSet: "$tags.type"}
}},
{$addFields: {
count: {$size: "$names"},
names: {
$arrayToObject: {
$map: {
input: "$uniqueNames",
as: "needle",
in: {
k: "$$needle",
v: {
$size: {
$filter: {
input: "$names",
cond: {$eq: ["$$this","$$needle"]}
}}}}}}},
types: {
$arrayToObject: {
$map: {
input: "$uniqueTypes",
as: "needle",
in: {
k: "$$needle",
v: {$size: {
$filter: {
input: "$types",
cond: { $eq: [ "$$this","$$needle"]}
}}}}}}}}},
{
$project: {
uniqueNames: 0,
uniqueTypes: 0
}}
])
Playground
Related
How can I get only objects in the sales array matching with 2021-10-14 date ?
My aggregate query currently returns all objects of the sales array if at least one is matching.
Dataset Documents
{
"name": "#0",
"sales": [{
"date": "2021-10-14",
"price": 3.69,
},{
"date": "2021-10-15",
"price": 2.79,
}]
},
{
"name": "#1",
"sales": [{
"date": "2021-10-14",
"price": 1.5,
}]
}
Aggregate
{
$match: {
sales: {
$elemMatch: {
date: '2021-10-14',
},
},
},
},
{
$group: {
_id: 0,
data: {
$push: '$sales',
},
},
},
{
$project: {
data: {
$reduce: {
input: '$data',
initialValue: [],
in: {
$setUnion: ['$$value', '$$this'],
},
},
},
},
}
Result
{"date": "2021-10-14","price": 3.69},
{"date": "2021-10-15","price": 2.79},
{"date": "2021-10-14","price": 1.5}
Result Expected
{"date": "2021-10-14","price": 3.69},
{"date": "2021-10-14","price": 1.5}
You actually need to use a $replaceRoot or $replaceWith pipeline which takes in an expression that gives you the resulting document filtered using $arrayElemAt (or $first) and $filter from the sales array:
[
{ $match: { 'sales.date': '2021-10-14' } },
{ $replaceWith: {
$arrayElemAt: [
{
$filter: {
input: '$sales',
cond: { $eq: ['$$this.date', '2021-10-14'] }
}
},
0
]
} }
]
OR
[
{ $match: { 'sales.date': '2021-10-14' } },
{ $replaceRoot: {
newRoot: {
$arrayElemAt: [
{
$filter: {
input: '$sales',
cond: { $eq: ['$$this.date', '2021-10-14'] }
}
},
0
]
}
} }
]
Mongo Playground
In $project stage, you need $filter operator with input as $reduce operator to filter the documents.
{
$project: {
data: {
$filter: {
input: {
$reduce: {
input: "$data",
initialValue: [],
in: {
$setUnion: [
"$$value",
"$$this"
],
}
}
},
cond: {
$eq: [
"$$this.date",
"2021-10-14"
]
}
}
}
}
}
Sample Mongo Playground
How about using $unwind:
.aggregate([
{$match: { sales: {$elemMatch: {date: '2021-10-14'} } }},
{$unwind: '$sales'},
{$match: {'sales.date': '2021-10-14'}},
{$project: {date: '$sales.date', price: '$sales.price', _id: 0}}
])
This will separate the sales into different documents, each containing only one sale, and allow you to match conditions easily.
See: https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/
My objective is to write an efficient query, that with the given input, gives me the expected output. I have some working solution, but all "types" are "manually" written, so I guess I'm looking for help to get the same output but in a different way.
input
reportId
type
weight
A
"fish"
4
A
"fish"
2
A
"cow"
0
B
"fish"
2
B
"tuna"
1
B
"bird"
Expected output
[
{
reportId: "A",
totalCount: 3,
totalWeight: 6,
fishCount: 2,
tunaCount: 0,
cowCount: 1,
birdCount: 0
},
{
reportId: "A",
totalCount: 3,
totalWeight: 2,
fishCount: 1,
tunaCount: 1,
cowCount: 0,
birdCount: 1
},
]
Partial "hard-coded" solution
What I have been doing so far is to create 2 group-by steps: It kind of get's the job done, but in my real use-case there are a lot of types, and therefore the group-stages are very long.
[
{
$group: {
_id: { reportId: "$reportId", type: $type },
count: { $sum: 1 },
totalWeight: { $sum: "$weight" }
}
},
{
$group: {
_id: "$_id.reportId",
totalCount: { $sum: "$totalCount" },
totalWeight: { $sum: "$totalWeight" },
fishCount: {
$sum: {
$cond: {
"if": { $eq: ["$_id.type", "fish"] },
then: "$count",
else: 0
}
}
},
tunaCount: {
$sum: {
$cond: {
"if": { $eq: ["$_id.type", "tuna"] },
then: "$count",
else: 0
}
}
},
// <== And here I have a count blog for each type. Can I get the same result in a better way?
}
}
]
I will focus to the second part, which is the difficult one. I don't know whether there is a shorter and better solution, but this one should work:
db.collection.aggregate([
{
$unset: "_id"
},
{
$set: {
data: {
"$objectToArray": "$$ROOT"
}
}
},
{
$group: {
_id: "$reportId",
data: {
$push: "$data"
}
}
},
{
$set: {
data: {
$reduce: {
input: "$data",
initialValue: [],
in: {
$concatArrays: [
"$$value",
"$$this"
]
}
}
}
}
},
{
$set: {
data: {
$filter: {
input: "$data",
cond: {
$not: {
$in: [
"$$this.k",
[
"totalCount",
"totalWeight"
]
]
}
}
}
}
}
},
{
$unwind: "$data"
},
{
$group: {
_id: "$_id",
data: {
$push: "$data"
}
}
},
{
$replaceRoot: {
newRoot: {
$arrayToObject: "$data"
}
}
}
])
See Mongo playground
I'm new to MongoDB, and having some problems. My document contains fixed size int array that i should sum.
Can mongo sum two int array in a query grouped by another field? In my case, string date.
example data:
{d:[1,2], date:"17-01-2020"} {d:[3,4], date:"17-01-2020"} {d:[5,6], date:"18-01-2020"}
query result that i want:
{d:[4, 6], date:"17-01-2020"} {d:[5,6], date:"18-01-2020"}
If you have fixes size of two values then you can use this one:
db.collection.aggregate([
{ $group: { _id: null, data: { $push: "$d" } } },
{
$set: {
d: [
{$sum:{ $map: { input: "$data", in: { $first: "$$this" } } }},
{$sum:{ $map: { input: "$data", in: { $last: "$$this" } } }}
]
}
},
{$unset: "data"}
])
For any length use this one:
db.collection.aggregate([
{ $group: { _id: null, data: { $push: "$d" } } },
{
$set: {
d: {
$map: {
input: { $range: [0, { $size: { $first: "$data" } }] },
as: "idx",
in: { $sum: { $map: { input: "$data", in: { $arrayElemAt: ["$$this", "$$idx"] } } } }
}
}
}
}
])
The first array determines the length of all the other arrays.
Another approach is this one:
db.collection.aggregate([
{ $unwind: { path: "$d", includeArrayIndex: "idx" } },
{ $group: { _id: "$idx", d: { $sum: "$d" } } },
{ $sort: { _id: 1 } },
{ $group: { _id: null, d: { $push: "$d" } } }
])
Or if you like to add other fields:
db.collection.aggregate([
{ $unwind: { path: "$d", includeArrayIndex: "idx" } },
{ $group: { _id: "$idx", d: { $sum: "$d" }, date: { $first: "$date" } } },
{ $sort: { _id: 1 } },
{ $group: { _id: "$date", d: { $push: "$d" } } },
{ $project: { d: 1, date: "$_id" } }
])
This works even if the arrays do not all have the same length.
Hey i need to get the sum of all totalPrice group by days
I get this result
but i need to fetch all rest days of month even if it returns 0
i need solution
this is my code
Order.aggregate([
{ $project: { yearMonthDay: { $dateToString: { format: "%Y-%m-%d", date: '$created' }}, totalPrice:"$totalPrice" }},
{ $group: { _id: "$yearMonthDay", count: { $sum: 1 }, total: {"$sum": "$totalPrice"} }},
{ $sort: { _id: -1 } },
{ $group: { _id: null, stats: { $push: "$$ROOT" }}},
{
$project: {
results: {
$map: {
input:{ $range:[16,31] },
as: 'day',
in: {
$let: {
vars: {
dateIndex: {
"$indexOfArray": ["$stats._id", {$dateToString:{ date:{$dateFromParts:{'year':2020, 'month':5, 'day':"$$day"}}, format:'%Y-%m-%d'}}]
}
},
in: {
$cond: {
if: { $ne: ["$$dateIndex", -1] },
then: { $arrayElemAt: ["$stats", "$$dateIndex"] },
else: { _id: {$dateToString:{ date:{$dateFromParts:{'year':2020, 'month':5, 'day':"$$day"}}, format:'%Y-%m-%d'}, count: 0, total: 0 } }
}
}
}
}
}
}
}
},
{ $unwind: "$results" },
{ $replaceRoot: { newRoot: "$results"}}
]
This query should work for you.
db.collectionName.aggregate([
{ $project: { yearMonthDay: { $dateToString: { format: "%Y-%m-%d", date: '$created' }}, totalPrice:"$totalPrice" }},
{ $group: { _id: "$yearMonthDay", count: { $sum: 1 }, total: {"$sum": "$totalPrice"} }},
{ $sort: { _id: -1 } },
{ $group: { _id: null, stats: { $push: "$$ROOT" }},
{
$project: {
results: {
$map: {
input: ["2020-05-16","2020-05-15","2020-05-14","2020-05-13","2020-05-12"],
as: "date",
in: {
$let: {
vars: {
dateIndex: {
"$indexOfArray": ["$stats._id", "$$date"]
}
},
in: {
$cond: {
if: { $ne: ["$$dateIndex", -1] },
then: { $arrayElemAt: ["$stats", "$$dateIndex"] },
else: { _id: "$$date", count: 0, total: 0 }
}
}
}
}
}
}
}
},
{ $unwind: "$results" },
{ $replaceRoot: { newRoot: "$results"}}
])
The First 3 steps is same as yours.
{ $group: { _id: null, stats: { $push: "$$ROOT" }} will push previous stage results into an arrray stats which we will use for lookup in later stage.
In last stage, we will create possible date range and iterate over that.
for each key in range.
"$indexOfArray": ["$stats._id", "$$date"] will check if date is present in stats array or not
Then we will use that index to fetch value from stats array otherwise push default values.
As these results are still under results, we will unwind that array and move to root.
If you server version is above 3.6,
we can simplify date range creation part as well. let's initialize input arrays as days using $range.
input:{ $range:[16,31] },
as: 'day'
and modifiy dateIndex part like this
dateIndex: {
"$indexOfArray": ["$stats._id", {$dateToString:{ date:{$dateFromParts:{'year':2020, 'month':5, 'day':"$$day"}}, format:'%Y-%m-%d'}]
}
And change default value part as well similarly.
else: { _id: {$dateToString:{ date:{$dateFromParts:{'year':2020, 'month':5, 'day':"$$day"}}, format:'%Y-%m-%d'}}, count: 0, total: 0 }
Or alternatively, we can also use concat for generating keys
dateIndex: {
"$indexOfArray": ["$stats._id", {$concat:["2020-05","-", {$convert:{input:"$$day", to:"string"}}]}]
}
// And default value
else: { _id: {$concat:["2020-05","-", {$convert:{input:"$$day", to:"string"}}]}, count: 0, total: 0 }
Similarly, you can run another loop for months as well.
I want to group objects in the array by same value for specified field and produce a count.
I have the following mongodb document (non-relevant fields are not present).
{
arrayField: [
{ fieldA: value1, ...otherFields },
{ fieldA: value2, ...otherFields },
{ fieldA: value2, ...otherFields }
],
...otherFields
}
The following is what I want.
{
arrayField: [
{ fieldA: value1, ...otherFields },
{ fieldA: value2, ...otherFields },
{ fieldA: value2, ...otherFields }
],
newArrayField: [
{ fieldA: value1, count: 1 },
{ fieldA: value2, count: 2 },
],
...otherFields
}
Here I grouped embedded documents by fieldA.
I know how to do it with unwind and 2 group stages the following way. (irrelevant stages are ommited)
Concrete example
// document structure
{
_id: ObjectId(...),
type: "test",
results: [
{ choice: "a" },
{ choice: "b" },
{ choice: "a" }
]
}
db.test.aggregate([
{ $match: {} },
{
$unwind: {
path: "$results",
preserveNullAndEmptyArrays: true
}
},
{
$group: {
_id: {
_id: "$_id",
type: "$type",
choice: "$results.choice",
},
count: { $sum: 1 }
}
},
{
$group: {
_id: {
_id: "$_id._id",
type: "$_id.type",
result: "$results.choice",
},
groupedResults: { $push: { count: "$count", choice: "$_id.choice" } }
}
}
])
You can use below aggregation
db.test.aggregate([
{ "$addFields": {
"newArrayField": {
"$map": {
"input": { "$setUnion": ["$arrayField.fieldA"] },
"as": "m",
"in": {
"fieldA": "$$m",
"count": {
"$size": {
"$filter": {
"input": "$arrayField",
"as": "d",
"cond": { "$eq": ["$$d.fieldA", "$$m"] }
}
}
}
}
}
}
}}
])
The below adds a new array field, which is generated by:
Using $setUnion to get unique set of array items, with inner $map to
extract only the choice field
Using $map on the unique set of items,
with inner $reduce on the original array, to sum all items where
choice matches
Pipeline:
db.test.aggregate([{
$addFields: {
newArrayField: {
$map: {
input: {
$setUnion: [{
$map: {
input: "$results",
in: { choice: "$$this.choice" }
}
}
]
},
as: "i",
in: {
choice: '$$i.choice',
count: {
$reduce: {
input: "$results",
initialValue: 0,
in: {
$sum: ["$$value", { $cond: [ { $eq: [ "$$this.choice", "$$i.choice" ] }, 1, 0 ] }]
}
}
}
}
}
}
}
}])
The $reduce will iterate over the results array n times, where n is the number of unique values of choice, so the performance will depend on that.