Distinct array element with condition - mongodb

My documents look like this:
{
"_id": "1",
"tags": [
{ "code": "01-01", "type": "machine" },
{ "code": "04-06", "type": "gearbox" },
{ "code": "07-01", "type": "machine" }
]
},
{
"_id": "2",
"tags": [
{ "code": "03-04","type": "gearbox" },
{ "code": "01-01", "type": "machine" },
{ "code": "04-11", "type": "machine" }
]
}
I want to get distinct codes only for tags whose type is "machine". so, for the example above, the result should be ["01-01", "07-01", "04-11"].
How do I do this?

Using $unwind and then $group with the tag as the key will give you each tag in a separate document in your result set:
db.collection_name.aggregate([
{
$unwind: "$tags"
},
{
$match: {
"tags.type": "machine"
}
},
{
$group: {
_id: "$tags.code"
}
},
{
$project:{
_id:false
code: "$_id"
}
}
]);
Or, if you want them put into an array within a single document, you can use $push within a second $group stage:
db.collection_name.aggregate([
{
$unwind: "$tags"
},
{
$match: {
"tags.type": "machine"
}
},
{
$group: {
_id: "$tags.code"
}
},
{
$group:{
_id: null,
codes: {$push: "$_id"}
}
}
]);
Another user suggested including an initial stage of { $match: { "tags.type": "machine" } }. This is a good idea if your data is likely to contain a significant number of documents that do not include "machine" tags. That way you will eliminate unnecessary processing of those documents. Your pipeline would look like this:
db.collection_name.aggregate([
{
$match: {
"tags.type": "machine"
}
},
{
$unwind: "$tags"
},
{
$match: {
"tags.type": "machine"
}
},
{
$group: {
_id: "$tags.code"
}
},
{
$group:{
_id: null,
codes: {$push: "$_id"}
}
}
]);

> db.foo.aggregate( [
... { $unwind : "$tags" },
... { $match : { "tags.type" : "machine" } },
... { $group : { "_id" : "$tags.code" } },
... { $group : { _id : null , "codes" : {$push : "$_id"} }}
... ] )
{ "_id" : null, "codes" : [ "04-11", "07-01", "01-01" ] }

A better way would be to group directly on tags.type and use addToSet on tags.code.
Here's how we can achieve the same output in 3 stages of aggregation :
db.name.aggregate([
{$unwind:"$tags"},
{$match:{"tags.type":"machine"}},
{$group:{_id:"$tags.type","codes":{$addToSet:"$tags.code"}}}
])
Output : { "_id" : "machine", "codes" : [ "04-11", "07-01", "01-01" ] }
Also, if you wish to filter out tag.type codes, we just need to replace "machine" in match stage with desired tag.type.

Related

MongoDB - aggregation - array of array data selection

I'm want to create an aggregation for the following contents of a collection:
{ "_id": ObjectId("574ffe9bda461e4b4b0043ab"),
"list1": [
"_id": "54",
"list2": [
{
"lang": "EN",
"value": "val1"
},
{
"lang": "ES",
"value": "val2"
},
{
"lang": "FR",
"value": "val3"
},
{
"lang": "IT",
"value": "val3"
}
]
]
}
From this collection i want to get as Object ("id": "54", "value": "val3") the returned Object is based on condition : list1.id = "54" and list2.lang = "IT"
You can try a simple combination of $match and $unwind to traverse your nested arrays:
db.collection.aggregate([
{
$unwind: "$list1"
},
{
$match: { "list1._id": "54" }
},
{
$unwind: "$list1.list2"
},
{
$match: { "list1.list2.lang": "IT" }
},
{
$project: {
_id: "$list1._id",
val: "$list1.list2.value"
}
}
])
Mongo Playground.
If the list._id field is unique you can index it and swap first first two pipeline stages to filter out other documents before running $unwind:
db.collection.aggregate([
{
$match: { "list1._id": "54" }
},
{
$unwind: "$list1"
},
{
$unwind: "$list1.list2"
},
{
$match: { "list1.list2.lang": "IT" }
},
{
$project: {
_id: "$list1._id",
val: "$list1.list2.value"
}
}
])

Aggregation at each document level mongodb

I have a list of documents like this
[{
"_id": "5dbc95f921d7625303fe2369",
"name": "John",
"itemsPurchased": [{
"offer": "o1",
"items": ["p1"]
},{
"offer": "o1",
"items": ["p1"]
},
{
"offer": "o1",
"items": ["p2"]
},
{
"offer": "o2",
"items": ["p1"]
}, {
"offer": "o7",
"items": ["p1"]
}
]
},
{
"_id": "zbc95f921d7625303fe2363",
"name": "Doe",
"itemsPurchased": [{
"offer": "o1",
"items": ["p11"]
},{
"offer": "o1",
"items": ["p11"]
},
{
"offer": "o2",
"items": ["p13"]
},
{
"offer": "o1",
"items": ["p22"]
},
{
"offer": "o2",
"items": ["p11"]
}, {
"offer": "o3",
"items": ["p11"]
}
]
}
]
And i am trying to compute unique offers on unique products by each customer, expecting the resultant to be like:
[
{
"_id": "5dbc95f921d7625303fe2369",
"name": "John",
"offersAndProducts": {
"o1":2,
"o2":2,
"o3":1
},
{
"_id": "zbc95f921d7625303fe2363",
"name": "Doe",
"offersAndProducts": {
"o1":2,
"o2":1,
"o7":1
}
]
I want to apply aggregations per document, After performing $unwind on itemsPurchased, applied $group on items and then on offer to eliminate the duplication:
{
"$group" : {
"_id" : {
"item" : {
"$arrayElemAt" : [
"$itemsPurchased.item",
0.0
]
},
"count" : {
"$sum" : 1.0
},
"offer" : "$itemsPurchased.offer"
}
}
}
then,
{
"$group" : {
"_id" : "$_id.offer",
"count" : {
"$sum" : 1.0
}
}
}
this gives the array of products and offers for all documents:
[
{o1:4,o2:3,o3:1,o7:1}
]
But i need it at document level.
tried $addFeild, but $unwind and $match operators gives invalid error.
Any other way of achieving this?
Generally speaking, it's an anti-pattern to $unwind an array and then to $group on the original _id since most operations can be done on the array directly, in a single stage. Here is what such a stage would look like:
{$addFields:{
offers:{$arrayToObject:{
$map:{
input:{$setUnion:"$itemsPurchased.offer"},
as:"o",
in:[
"$$o",
{$size:{$setUnion:{$let:{
vars:{items:{$filter:{
input:"$itemsPurchased",
cond:{$eq:["$$this.offer","$$o"]}
}}},
in:{$reduce:{
input:"$$items",
initialValue:[],
in:{$concatArrays:["$$value","$$items.items"]}
}}
}}}
}]
}
}}
}}
What this does is create an array where each element is a two element array (which is a syntax that $arrayToObject can convert to an object where first element is key name and second is value) and the input is a unique set of offers and for each we accumulate an array of products, get rid of duplicates (with $setUnion) and then get the size of the result. What this produces on your input is this:
"offers" : {
"o1" : 2,
"o2" : 2,
"o3" : 1
}
You need to run $unwind and $group twice. To count only unique items you can use $addToSet. To build your keys dynamically you need to use $arrayToObject:
db.collection.aggregate([
{
$unwind: "$itemsPurchased"
},
{
$unwind: "$itemsPurchased.items"
},
{
$group: {
_id: {
_id: "$_id",
offer: "$itemsPurchased.offer"
},
name: { $first: "$name" },
items: { $addToSet: "$itemsPurchased.items" }
}
},
{
$group: {
_id: "$_id._id",
name: { $first: "$name" },
offersAndProducts: { $push: { k: "$_id.offer", v: { $size: "$items" } } }
}
},
{
$project: {
_id: 1,
name: 1,
offersAndProducts: { $arrayToObject: "$offersAndProducts" }
}
}
])
Mongo Playground

MongoDB aggregate nested grouping

I have Asset collection which has data like
{
"_id" : ObjectId("5bfb962ee2a301554915"),
"users" : [
"abc.abc#abc.com",
"abc.xyz#xyz.com"
],
"remote" : {
"source" : "dropbox",
"bytes" : 1234
}
{
"_id" : ObjectId("5bfb962ee2a301554915"),
"users" : [
"pqr.pqr#pqr.com",
],
"remote" : {
"source" : "google_drive",
"bytes" : 785
}
{
"_id" : ObjectId("5bfb962ee2a301554915"),
"users" : [
"abc.abc#abc.com",
"abc.xyz#xyz.com"
],
"remote" : {
"source" : "gmail",
"bytes" : 5647
}
What I am looking for is group by users and get the total of bytes according to its source like
{
"_id" : "abc.abc#abc.com",
"bytes" : {
"google_drive": 1458,
"dropbox" : 1254
}
}
I am not getting how to get the nested output using grouping.
I have tried with the query
db.asset.aggregate(
[
{$unwind : '$users'},
{$group:{
_id:
{'username': "$users",
'source': "$remote.source",
'total': {$sum: "$remote.bytes"}} }
}
]
)
This way I am getting the result with the repeated username.
With MongoDb 3.6 and newer, you can leverage the use of $arrayToObject operator within a $mergeObjects expression and a $replaceRoot pipeline to get the desired result.
You would need to run the following aggregate pipeline though:
db.asset.aggregate([
{ "$unwind": "$users" },
{ "$group": {
"_id": {
"users": "$users",
"source": "$remote.source"
},
"totalBytes": { "$sum": "$remote.bytes" }
} },
{ "$group": {
"_id": "$_id.users",
"counts": {
"$push": {
"k": "$_id.source",
"v": "$totalBytes"
}
}
} },
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{ "bytes": { "$arrayToObject": "$counts" } },
"$$ROOT"
]
}
} },
{ "$project": { "counts": 0 } }
])
which yields
/* 1 */
{
"bytes" : {
"gmail" : 5647.0,
"dropbox" : 1234.0
},
"_id" : "abc.abc#abc.com"
}
/* 2 */
{
"bytes" : {
"google_drive" : 785.0
},
"_id" : "pqr.pqr#pqr.com"
}
/* 3 */
{
"bytes" : {
"gmail" : 5647.0,
"dropbox" : 1234.0
},
"_id" : "abc.xyz#xyz.com"
}
using the above sample documents.
You have to use $group couple of times here. First with the users and the source and count the total number of bytes using $sum.
And second with the users and $push the source and the bytes into an array
db.collection.aggregate([
{ "$unwind": "$users" },
{ "$group": {
"_id": {
"users": "$users",
"source": "$remote.source"
},
"bytes": { "$sum": "$remote.bytes" }
}},
{ "$group": {
"_id": "$_id.users",
"data": {
"$push": {
"source": "$_id.source",
"bytes": "$bytes"
}
}
}}
])
And even if you want to convert the source and the bytes into key value format then replace the last $group stage with the below two stages.
{ "$group": {
"_id": "$_id.users",
"data": {
"$push": {
"k": "$_id.source",
"v": "$bytes"
}
}
}},
{ "$project": {
"_id": 0,
"username": "$_id",
"bytes": { "$arrayToObject": "$data" }
}}

How to get nested 3 label array object in Mongo Query?

Basically the structure is :
{
"_id" : ObjectId("123123"),
"stores" : [
{
"messages" : [
{
"updated_time" : "2018-05-15T05:12:25+0000",
"message_count" : 4,
"thread_id" : "123",
"messages" : [
{
"message" : "Hi User ",
"created_time" : "2018-05-15T05:12:25+0000",
"message_id" : "111",
},
{
"message" : "This is tes",
"created_time" : "2018-05-15T05:12:21+0000",
"message_id" : "222",
}
]
},
],
"store_id" : "123"
}
]
}
I have these values to get message_id object : 111. So how to get this object, any idea or help will be appreciated. THanks
store_id: 123,
thread_id:123,
message_id:111
The simplest way would be to $unwind all the nested arrays and then use $match to get single document. You can also add $replaceRoot to get only nested document. Try:
db.collection.aggregate([
{ $unwind: "$stores" },
{ $unwind: "$stores.messages" },
{ $unwind: "$stores.messages.messages" },
{ $match: { "stores.store_id": "123", "stores.messages.thread_id": "123", "stores.messages.messages.message_id": "111" } },
{ $replaceRoot: { newRoot: "$stores.messages.messages" } }
])
Prints:
{
"created_time": "2018-05-15T05:12:25+0000",
"message": "Hi User ",
"message_id": "111"
}
To improve the performance you can use $match after every $unwind to filter out unnecessary data as soon as possible, try:
db.collection.aggregate([
{ $unwind: "$stores" },
{ $match: { "stores.store_id": "123" } },
{ $unwind: "$stores.messages" },
{ $match: { "stores.messages.thread_id": "123" } },
{ $unwind: "$stores.messages.messages" },
{ $match: { "stores.messages.messages.message_id": "111" } },
{ $replaceRoot: { newRoot: "$stores.messages.messages" } }
])

Need to sum from array object value in mongodb

I am trying to calculate total value if that value exits. But query is not working 100%. So can somebody help me to solve this problem. Here my sample document. I have attached two documents. Please these documents & find out best solution
Document : 1
{
"_id" : 1"),
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "11",
"saleValue": 1000
},
{
"id" : "112",
"saleValue": 1400
},
{
"id" : "22",
},
{
"id" : "234",
"saleValue": 111
}
],
},
"createdTime" : ISODate("2018-03-18T10:18:48.000Z")
}
Document : 2
{
"_id" : 444,
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "444",
"saleValue" : 2060
},
{
"id" : "444",
},
{
"id" : 234,
"saleValue" : 260
},
{
"id" : "34534",
}
]
},
"createdTime" : ISODate("2018-03-18T03:11:50.000Z")
}
Needed Output:
{
total : 4831
}
My query :
db.getCollection('myCollection').aggregate([
{
"$group": {
"_id": "$Id",
"totalValue": {
$sum: {
$sum: "$messages.data.saleValue"
}
}
}
}
])
So please if possible help me to solve this problem. Thanks in advance
It's not working correctly because it is aggregating all the documents in the collection; you are grouping on a constant "_id": "tempId", you just need to reference the correct key by adding the $ as:
db.getCollection('myCollection').aggregate([
{ "$group": {
"_id": "$tempId",
"totalValue": {
"$sum": { "$sum": "$messages.data.saleValue" }
}
} }
])
which in essence is a single stage pipeline version of an aggregate operation with an extra field that holds the sum expression before the group pipeline then calling that field as the $sum operator in the group.
The above works since $sum from MongoDB 3.2+ is available in both the $project and $group stages and when used in the $project stage, $sum returns the sum of the list of expressions. The expression "$messages.data.value" returns a list of numbers [120, 1200] which are then used as the $sum expression:
db.getCollection('myCollection').aggregate([
{ "$project": {
"values": { "$sum": "$messages.data.value" },
"tempId": 1,
} },
{ "$group": {
"_id": "$tempId",
"totalValue": { "$sum": "$values" }
} }
])
You can add a $unwind before your $group, in that way you will deconstructs the data array, and then you can group properly:
db.myCollection.aggregate([
{
"$unwind": "$messages.data"
},
{
"$group": {
"_id": "tempId",
"totalValue": {
$sum: {
$sum: "$messages.data.value"
}
}
}
}
])
Output:
{ "_id" : "tempId", "totalValue" : 1320 }
db.getCollection('myCollection').aggregate([
{
$unwind: "$messages.data",
$group: {
"_id": "tempId",
"totalValue": { $sum: "$messages.data.value" }
}
}
])
$unwind
According to description as mentioned into above question, as a solution please try executing following aggregate query
db.myCollection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path: '$messages.data'
}
},
// Stage 2
{
$group: {
_id: {
pageId: '$pageId'
},
total: {
$sum: '$messages.data.saleValue'
}
}
},
// Stage 3
{
$project: {
pageId: '$_id.pageId',
total: 1,
_id: 0
}
}
]
);
You can do it without using $group. Grouping made other data to be managed and addressed. So, I prefer using $sum and $map as shown below:
db.getCollection('myCollection').aggregate([
{
$addFields: {
total: {
$sum: {
$map: {
input: "$messages.data",
as: "message",
in: "$$message.saleValue",
},
},
},
},
},
}
])