My documents look like this:
{
"_id": "1",
"tags": [
{ "code": "01-01", "type": "machine" },
{ "code": "04-06", "type": "gearbox" },
{ "code": "07-01", "type": "machine" }
]
},
{
"_id": "2",
"tags": [
{ "code": "03-04","type": "gearbox" },
{ "code": "01-01", "type": "machine" },
{ "code": "04-11", "type": "machine" }
]
}
I want to get distinct codes only for tags whose type is "machine". so, for the example above, the result should be ["01-01", "07-01", "04-11"].
How do I do this?
Using $unwind and then $group with the tag as the key will give you each tag in a separate document in your result set:
db.collection_name.aggregate([
{
$unwind: "$tags"
},
{
$match: {
"tags.type": "machine"
}
},
{
$group: {
_id: "$tags.code"
}
},
{
$project:{
_id:false
code: "$_id"
}
}
]);
Or, if you want them put into an array within a single document, you can use $push within a second $group stage:
db.collection_name.aggregate([
{
$unwind: "$tags"
},
{
$match: {
"tags.type": "machine"
}
},
{
$group: {
_id: "$tags.code"
}
},
{
$group:{
_id: null,
codes: {$push: "$_id"}
}
}
]);
Another user suggested including an initial stage of { $match: { "tags.type": "machine" } }. This is a good idea if your data is likely to contain a significant number of documents that do not include "machine" tags. That way you will eliminate unnecessary processing of those documents. Your pipeline would look like this:
db.collection_name.aggregate([
{
$match: {
"tags.type": "machine"
}
},
{
$unwind: "$tags"
},
{
$match: {
"tags.type": "machine"
}
},
{
$group: {
_id: "$tags.code"
}
},
{
$group:{
_id: null,
codes: {$push: "$_id"}
}
}
]);
> db.foo.aggregate( [
... { $unwind : "$tags" },
... { $match : { "tags.type" : "machine" } },
... { $group : { "_id" : "$tags.code" } },
... { $group : { _id : null , "codes" : {$push : "$_id"} }}
... ] )
{ "_id" : null, "codes" : [ "04-11", "07-01", "01-01" ] }
A better way would be to group directly on tags.type and use addToSet on tags.code.
Here's how we can achieve the same output in 3 stages of aggregation :
db.name.aggregate([
{$unwind:"$tags"},
{$match:{"tags.type":"machine"}},
{$group:{_id:"$tags.type","codes":{$addToSet:"$tags.code"}}}
])
Output : { "_id" : "machine", "codes" : [ "04-11", "07-01", "01-01" ] }
Also, if you wish to filter out tag.type codes, we just need to replace "machine" in match stage with desired tag.type.
Related
I'm want to create an aggregation for the following contents of a collection:
{ "_id": ObjectId("574ffe9bda461e4b4b0043ab"),
"list1": [
"_id": "54",
"list2": [
{
"lang": "EN",
"value": "val1"
},
{
"lang": "ES",
"value": "val2"
},
{
"lang": "FR",
"value": "val3"
},
{
"lang": "IT",
"value": "val3"
}
]
]
}
From this collection i want to get as Object ("id": "54", "value": "val3") the returned Object is based on condition : list1.id = "54" and list2.lang = "IT"
You can try a simple combination of $match and $unwind to traverse your nested arrays:
db.collection.aggregate([
{
$unwind: "$list1"
},
{
$match: { "list1._id": "54" }
},
{
$unwind: "$list1.list2"
},
{
$match: { "list1.list2.lang": "IT" }
},
{
$project: {
_id: "$list1._id",
val: "$list1.list2.value"
}
}
])
Mongo Playground.
If the list._id field is unique you can index it and swap first first two pipeline stages to filter out other documents before running $unwind:
db.collection.aggregate([
{
$match: { "list1._id": "54" }
},
{
$unwind: "$list1"
},
{
$unwind: "$list1.list2"
},
{
$match: { "list1.list2.lang": "IT" }
},
{
$project: {
_id: "$list1._id",
val: "$list1.list2.value"
}
}
])
I have a list of documents like this
[{
"_id": "5dbc95f921d7625303fe2369",
"name": "John",
"itemsPurchased": [{
"offer": "o1",
"items": ["p1"]
},{
"offer": "o1",
"items": ["p1"]
},
{
"offer": "o1",
"items": ["p2"]
},
{
"offer": "o2",
"items": ["p1"]
}, {
"offer": "o7",
"items": ["p1"]
}
]
},
{
"_id": "zbc95f921d7625303fe2363",
"name": "Doe",
"itemsPurchased": [{
"offer": "o1",
"items": ["p11"]
},{
"offer": "o1",
"items": ["p11"]
},
{
"offer": "o2",
"items": ["p13"]
},
{
"offer": "o1",
"items": ["p22"]
},
{
"offer": "o2",
"items": ["p11"]
}, {
"offer": "o3",
"items": ["p11"]
}
]
}
]
And i am trying to compute unique offers on unique products by each customer, expecting the resultant to be like:
[
{
"_id": "5dbc95f921d7625303fe2369",
"name": "John",
"offersAndProducts": {
"o1":2,
"o2":2,
"o3":1
},
{
"_id": "zbc95f921d7625303fe2363",
"name": "Doe",
"offersAndProducts": {
"o1":2,
"o2":1,
"o7":1
}
]
I want to apply aggregations per document, After performing $unwind on itemsPurchased, applied $group on items and then on offer to eliminate the duplication:
{
"$group" : {
"_id" : {
"item" : {
"$arrayElemAt" : [
"$itemsPurchased.item",
0.0
]
},
"count" : {
"$sum" : 1.0
},
"offer" : "$itemsPurchased.offer"
}
}
}
then,
{
"$group" : {
"_id" : "$_id.offer",
"count" : {
"$sum" : 1.0
}
}
}
this gives the array of products and offers for all documents:
[
{o1:4,o2:3,o3:1,o7:1}
]
But i need it at document level.
tried $addFeild, but $unwind and $match operators gives invalid error.
Any other way of achieving this?
Generally speaking, it's an anti-pattern to $unwind an array and then to $group on the original _id since most operations can be done on the array directly, in a single stage. Here is what such a stage would look like:
{$addFields:{
offers:{$arrayToObject:{
$map:{
input:{$setUnion:"$itemsPurchased.offer"},
as:"o",
in:[
"$$o",
{$size:{$setUnion:{$let:{
vars:{items:{$filter:{
input:"$itemsPurchased",
cond:{$eq:["$$this.offer","$$o"]}
}}},
in:{$reduce:{
input:"$$items",
initialValue:[],
in:{$concatArrays:["$$value","$$items.items"]}
}}
}}}
}]
}
}}
}}
What this does is create an array where each element is a two element array (which is a syntax that $arrayToObject can convert to an object where first element is key name and second is value) and the input is a unique set of offers and for each we accumulate an array of products, get rid of duplicates (with $setUnion) and then get the size of the result. What this produces on your input is this:
"offers" : {
"o1" : 2,
"o2" : 2,
"o3" : 1
}
You need to run $unwind and $group twice. To count only unique items you can use $addToSet. To build your keys dynamically you need to use $arrayToObject:
db.collection.aggregate([
{
$unwind: "$itemsPurchased"
},
{
$unwind: "$itemsPurchased.items"
},
{
$group: {
_id: {
_id: "$_id",
offer: "$itemsPurchased.offer"
},
name: { $first: "$name" },
items: { $addToSet: "$itemsPurchased.items" }
}
},
{
$group: {
_id: "$_id._id",
name: { $first: "$name" },
offersAndProducts: { $push: { k: "$_id.offer", v: { $size: "$items" } } }
}
},
{
$project: {
_id: 1,
name: 1,
offersAndProducts: { $arrayToObject: "$offersAndProducts" }
}
}
])
Mongo Playground
I have Asset collection which has data like
{
"_id" : ObjectId("5bfb962ee2a301554915"),
"users" : [
"abc.abc#abc.com",
"abc.xyz#xyz.com"
],
"remote" : {
"source" : "dropbox",
"bytes" : 1234
}
{
"_id" : ObjectId("5bfb962ee2a301554915"),
"users" : [
"pqr.pqr#pqr.com",
],
"remote" : {
"source" : "google_drive",
"bytes" : 785
}
{
"_id" : ObjectId("5bfb962ee2a301554915"),
"users" : [
"abc.abc#abc.com",
"abc.xyz#xyz.com"
],
"remote" : {
"source" : "gmail",
"bytes" : 5647
}
What I am looking for is group by users and get the total of bytes according to its source like
{
"_id" : "abc.abc#abc.com",
"bytes" : {
"google_drive": 1458,
"dropbox" : 1254
}
}
I am not getting how to get the nested output using grouping.
I have tried with the query
db.asset.aggregate(
[
{$unwind : '$users'},
{$group:{
_id:
{'username': "$users",
'source': "$remote.source",
'total': {$sum: "$remote.bytes"}} }
}
]
)
This way I am getting the result with the repeated username.
With MongoDb 3.6 and newer, you can leverage the use of $arrayToObject operator within a $mergeObjects expression and a $replaceRoot pipeline to get the desired result.
You would need to run the following aggregate pipeline though:
db.asset.aggregate([
{ "$unwind": "$users" },
{ "$group": {
"_id": {
"users": "$users",
"source": "$remote.source"
},
"totalBytes": { "$sum": "$remote.bytes" }
} },
{ "$group": {
"_id": "$_id.users",
"counts": {
"$push": {
"k": "$_id.source",
"v": "$totalBytes"
}
}
} },
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "bytes": { "$arrayToObject": "$counts" } },
"$$ROOT"
]
}
} },
{ "$project": { "counts": 0 } }
])
which yields
/* 1 */
{
"bytes" : {
"gmail" : 5647.0,
"dropbox" : 1234.0
},
"_id" : "abc.abc#abc.com"
}
/* 2 */
{
"bytes" : {
"google_drive" : 785.0
},
"_id" : "pqr.pqr#pqr.com"
}
/* 3 */
{
"bytes" : {
"gmail" : 5647.0,
"dropbox" : 1234.0
},
"_id" : "abc.xyz#xyz.com"
}
using the above sample documents.
You have to use $group couple of times here. First with the users and the source and count the total number of bytes using $sum.
And second with the users and $push the source and the bytes into an array
db.collection.aggregate([
{ "$unwind": "$users" },
{ "$group": {
"_id": {
"users": "$users",
"source": "$remote.source"
},
"bytes": { "$sum": "$remote.bytes" }
}},
{ "$group": {
"_id": "$_id.users",
"data": {
"$push": {
"source": "$_id.source",
"bytes": "$bytes"
}
}
}}
])
And even if you want to convert the source and the bytes into key value format then replace the last $group stage with the below two stages.
{ "$group": {
"_id": "$_id.users",
"data": {
"$push": {
"k": "$_id.source",
"v": "$bytes"
}
}
}},
{ "$project": {
"_id": 0,
"username": "$_id",
"bytes": { "$arrayToObject": "$data" }
}}
Basically the structure is :
{
"_id" : ObjectId("123123"),
"stores" : [
{
"messages" : [
{
"updated_time" : "2018-05-15T05:12:25+0000",
"message_count" : 4,
"thread_id" : "123",
"messages" : [
{
"message" : "Hi User ",
"created_time" : "2018-05-15T05:12:25+0000",
"message_id" : "111",
},
{
"message" : "This is tes",
"created_time" : "2018-05-15T05:12:21+0000",
"message_id" : "222",
}
]
},
],
"store_id" : "123"
}
]
}
I have these values to get message_id object : 111. So how to get this object, any idea or help will be appreciated. THanks
store_id: 123,
thread_id:123,
message_id:111
The simplest way would be to $unwind all the nested arrays and then use $match to get single document. You can also add $replaceRoot to get only nested document. Try:
db.collection.aggregate([
{ $unwind: "$stores" },
{ $unwind: "$stores.messages" },
{ $unwind: "$stores.messages.messages" },
{ $match: { "stores.store_id": "123", "stores.messages.thread_id": "123", "stores.messages.messages.message_id": "111" } },
{ $replaceRoot: { newRoot: "$stores.messages.messages" } }
])
Prints:
{
"created_time": "2018-05-15T05:12:25+0000",
"message": "Hi User ",
"message_id": "111"
}
To improve the performance you can use $match after every $unwind to filter out unnecessary data as soon as possible, try:
db.collection.aggregate([
{ $unwind: "$stores" },
{ $match: { "stores.store_id": "123" } },
{ $unwind: "$stores.messages" },
{ $match: { "stores.messages.thread_id": "123" } },
{ $unwind: "$stores.messages.messages" },
{ $match: { "stores.messages.messages.message_id": "111" } },
{ $replaceRoot: { newRoot: "$stores.messages.messages" } }
])
I am trying to calculate total value if that value exits. But query is not working 100%. So can somebody help me to solve this problem. Here my sample document. I have attached two documents. Please these documents & find out best solution
Document : 1
{
"_id" : 1"),
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "11",
"saleValue": 1000
},
{
"id" : "112",
"saleValue": 1400
},
{
"id" : "22",
},
{
"id" : "234",
"saleValue": 111
}
],
},
"createdTime" : ISODate("2018-03-18T10:18:48.000Z")
}
Document : 2
{
"_id" : 444,
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "444",
"saleValue" : 2060
},
{
"id" : "444",
},
{
"id" : 234,
"saleValue" : 260
},
{
"id" : "34534",
}
]
},
"createdTime" : ISODate("2018-03-18T03:11:50.000Z")
}
Needed Output:
{
total : 4831
}
My query :
db.getCollection('myCollection').aggregate([
{
"$group": {
"_id": "$Id",
"totalValue": {
$sum: {
$sum: "$messages.data.saleValue"
}
}
}
}
])
So please if possible help me to solve this problem. Thanks in advance
It's not working correctly because it is aggregating all the documents in the collection; you are grouping on a constant "_id": "tempId", you just need to reference the correct key by adding the $ as:
db.getCollection('myCollection').aggregate([
{ "$group": {
"_id": "$tempId",
"totalValue": {
"$sum": { "$sum": "$messages.data.saleValue" }
}
} }
])
which in essence is a single stage pipeline version of an aggregate operation with an extra field that holds the sum expression before the group pipeline then calling that field as the $sum operator in the group.
The above works since $sum from MongoDB 3.2+ is available in both the $project and $group stages and when used in the $project stage, $sum returns the sum of the list of expressions. The expression "$messages.data.value" returns a list of numbers [120, 1200] which are then used as the $sum expression:
db.getCollection('myCollection').aggregate([
{ "$project": {
"values": { "$sum": "$messages.data.value" },
"tempId": 1,
} },
{ "$group": {
"_id": "$tempId",
"totalValue": { "$sum": "$values" }
} }
])
You can add a $unwind before your $group, in that way you will deconstructs the data array, and then you can group properly:
db.myCollection.aggregate([
{
"$unwind": "$messages.data"
},
{
"$group": {
"_id": "tempId",
"totalValue": {
$sum: {
$sum: "$messages.data.value"
}
}
}
}
])
Output:
{ "_id" : "tempId", "totalValue" : 1320 }
db.getCollection('myCollection').aggregate([
{
$unwind: "$messages.data",
$group: {
"_id": "tempId",
"totalValue": { $sum: "$messages.data.value" }
}
}
])
$unwind
According to description as mentioned into above question, as a solution please try executing following aggregate query
db.myCollection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path: '$messages.data'
}
},
// Stage 2
{
$group: {
_id: {
pageId: '$pageId'
},
total: {
$sum: '$messages.data.saleValue'
}
}
},
// Stage 3
{
$project: {
pageId: '$_id.pageId',
total: 1,
_id: 0
}
}
]
);
You can do it without using $group. Grouping made other data to be managed and addressed. So, I prefer using $sum and $map as shown below:
db.getCollection('myCollection').aggregate([
{
$addFields: {
total: {
$sum: {
$map: {
input: "$messages.data",
as: "message",
in: "$$message.saleValue",
},
},
},
},
},
}
])