I need more help getting aggregated data from mongodb - mongodb

I have a table with documents that look like this:
{
"_id" : ObjectId("bbbbbb1d9486c90479aaaaaa"),
"record" : {
"debug" : false
"type" : "MX_GTI",
"products" : [
"DAM"
],
"agents" : [
{
"services" : "mssql",
"hpsAvg" : 772,
"hpsMax" : 42901
},
{
"services" : "mssql",
"hpsAvg" : 95,
"hpsMax" : 21631
},
{
"services" : "oracle",
"hpsAvg" : 0,
"hpsMax" : 0
},
{
"services" : "db2",
"hpsAvg" : 0,
"hpsMax" : 0
}
]
}
}
I need to find the average and max HPS per DB type (the field services) across all the agents in all records that match the condition ("type": "MX_GTI"),
The max is the largest hpsMax across all agents with the database type, and the average is the average of all the non-zero values of hpsAvg.
The output should look like this:
[
{
"dbtype": "oracle",
"maxHPS": 123456,
"avgHPS": 12345
},…
]
Thank you

The difficult part is to make average to work if the value is 0.
$avg aggregation ignores non numeric values, so we need to replace 0 values with null before applying average. We can use $cond to make this transformation.
Playground
db.collection.aggregate([
{
$match: {
"record.type": "MX_GTI"
}
},
{
$unwind: "$record.agents"
},
{
$addFields: {
"record.agents.hpsAvg": {
$cond: {
if: {
$eq: [
"$record.agents.hpsAvg",
0
]
},
then: null,
else: "$record.agents.hpsAvg"
}
}
}
},
{
$group: {
_id: "$record.agents.services",
maxHPS: {
$max: "$record.agents.hpsMax"
},
avgHPS: {
$avg: "$record.agents.hpsAvg"
}
}
},
{
$addFields: {
dbType: "$_id"
}
},
{
$project: {
_id: 0
}
}
])

You can start from here
db.collection.aggregate([
{
$match: {
"record.type": "MX_GTI"
}
},
{
$unwind: "$record.agents"
},
{
$group: {
_id: "$record.agents.services",
"maxHPS": {
$max: "$record.agents.hpsMax"
},
"avgHPS": {
$avg: "$record.agents.hpsMax"
}
}
}
])
Sample Playground

Related

Count if a value exists inside the array of objects [ MongoDB ]

I am developing a simple chat application with MongoDB and got stuck into a situation.
My document in database is as
{
"_id" : ObjectId("605a217ed8168f4c262f4782"),
"message" : "Hi, This is a test message",
"created" : ISODate("2021-03-23T17:12:30.000Z"),
"user" : {
"_id" : ObjectId("5977af7df1d8cc4623283b14"),
"name" : "Sender Of Message"
},
"recipients" : [
{
"_id" : ObjectId("5977af7df1d8cc4623283b14"),
"time" : ISODate("2021-03-23T17:12:30.000Z")
},
{
"_id" : ObjectId("5df50a5eaa0e3c3104006101"),
"time" : ISODate("2021-03-23T17:12:35.000Z")
}
],
"target" : {
"_id" : ObjectId("5df50a5eaa0e3c3104006101"),
"name" : "Target Person"
},
"status" : 1
}
When I try to get the last message with the unread count of the user I am always getting 1
Here is the query that I tried on.
db.collection.aggregate([
{ $match: { 'target._id': ObjectId('5df50a5eaa0e3c3104006101'), status: 1 } },
{ $sort: { _id: -1 } },
{
$group: {
_id: '$user._id',
doc: { $first: '$$ROOT' },
unread: {
$sum: {
$cond: {
if: { $ne: [ ObjectId('5df50a5eaa0e3c3104006101'), '$recipients._id' ] },
then: 1,
else: 0
}
}
}
}
}
])
If the collection contains even just the one document above, it is supposed to give 0 as the object inside the recipients array already contains the _id as ObjectId('5df50a5eaa0e3c3104006101'), but I'm getting 1 for the unread count. Any help?
Here is the output that I get from the query
{
"_id" : ObjectId("5977af7df1d8cc4623283b14"),
"doc" : {
"_id" : ObjectId("605a217ed8168f4c262f4782"),
"message" : "Hi, This is a test message",
"created" : ISODate("2021-03-23T17:12:30.000Z"),
"user" : {
"_id" : ObjectId("5977af7df1d8cc4623283b14"),
"name" : "Sender Of Message"
},
"recipients" : [
{
"_id" : ObjectId("5977af7df1d8cc4623283b14"),
"time" : ISODate("2021-03-23T17:12:30.000Z")
},
{
"_id" : ObjectId("5df50a5eaa0e3c3104006101"),
"time" : ISODate("2021-03-23T17:12:35.000Z")
}
],
"target" : {
"_id" : ObjectId("5df50a5eaa0e3c3104006101"),
"name" : "Target Person"
},
"status" : 1
},
"unread" : 1.0
}
I know why its showing with the count as 1
The array recipients contains an object with _id as ObjectId("5977af7df1d8cc4623283b14") inside it, so its a non matching condition. Which is causing the if condition to be satisfied and produce a value 1.
But I need to figure out how to query it to get the actual value.
Please note that I cant use $push operator on recipients array as it might have greater amount of object ( maybe in future )
Thanks for the support, but I have found the answer by myself.
Here is my approch to get the data as per the requirement.
Instead of searching for the records within the array what I did is
Filtered the data array to the _id that I don't need, so the array will have exactly one document or else it will be empty.
When taking the negation of the condition. ie, when there is one value in the array I need the counter to be 0 or else it should be 1
So I used the $size to check the array's size and $filter to filter out the other _ids and then used $sum to increment the counter as required.
db.collection.aggregate([
{ $match: { 'target._id': ObjectId('5df50a5eaa0e3c3104006101'), status: 1 } },
{ $sort: { _id: -1 } },
{
$group: {
_id: '$user._id',
doc: { $first: '$$ROOT' },
unread: {
$sum: {
$cond:{
if: {
$size: {
$filter: {
input: '$recipients',
as: 'item',
cond: { $eq: [ ObjectId('5df50a5eaa0e3c3104006101'), '$$item._id' ] }
}
}
},
then: 0,
else: 1
}
}
}
}
}
])
Try to Use like this:
db.getCollection('test').aggregate([
{ $match: { 'target._id': ObjectId('5df50a5eaa0e3c3104006101'), status: 1
} },
{ $sort: { _id: -1 } },
{ $unwind: { path: "$recipients", preserveNullAndEmptyArrays: true } },
{
$group: {
_id: '$user._id',
doc: { $first: '$$ROOT' },
unread: {
$sum: {
$cond: {
if: { $ne: ['$recipients._id',
ObjectId('5df50a5eaa0e3c3104006101') ] },
then: 1,
else: 0
}
}
}
}
}
])

Filter keys not in collection

How do we find keys which do not exist in collection.
Given an input list of keys ['3321', '2121', '5647'] , i want to return those that do not exist in the collection :
{ "_id" : { "$oid" : "5e2993b61886a22f400ea319" }, "scrip" : "5647" }
{ "_id" : { "$oid" : "5e2993b61886a22f400ea31a" }, "scrip" : "3553" }
So the expected output is ['3321', '2121']
This aggregation gets the desired output (works with MongoDB version 3.4 or later):
INPUT_ARRAY = ['3321', '2121', '5647']
db.test.aggregate( [
{
$match: {
scrip: {
$in: INPUT_ARRAY
}
}
},
{
$group: {
_id: null,
matches: { $push: "$scrip" }
}
},
{
$project: {
scrips_not_exist: { $setDifference: [ INPUT_ARRAY, "$matches" ] },
_id: 0
}
}
] )
The output:
{ "scrips_not_exist" : [ "3321", "2121" ] }

using mongo aggregation how to replace the fields names [duplicate]

I have large collection of documents which represent some kind of events. Collection contains events for different userId.
{
"_id" : ObjectId("57fd7d00e4b011cafdb90d22"),
"userId" : "123123123",
"userType" : "mobile",
"event_type" : "clicked_ok",
"country" : "US",
"timestamp" : ISODate("2016-10-12T00:00:00.308Z")
}
{
"_id" : ObjectId("57fd7d00e4b011cafdb90d22"),
"userId" : "123123123",
"userType" : "mobile",
"event_type" : "clicked_cancel",
"country" : "US",
"timestamp" : ISODate("2016-10-12T00:00:00.308Z")
}
At midnight I need to run aggregation for all documents for the previous day. Documents need to aggregated in the way so I could get number of different events for particular userId.
{
"userId" : "123123123",
"userType" : "mobile",
"country" : "US",
"clicked_ok" : 23,
"send_message" : 14,
"clicked_cancel" : 100,
"date" : "2016-11-24",
}
During aggregation I need to perform two things:
calculate number of events for particular userId
add "date" text fields with date
Any help is greatly appreciated! :)
you can do this with aggregation like this :
db.user.aggregate([
{
$match:{
$and:[
{
timestamp:{
$gte: ISODate("2016-10-12T00:00:00.000Z")
}
},
{
timestamp:{
$lt: ISODate("2016-10-13T00:00:00.000Z")
}
}
]
}
},
{
$group:{
_id:"$userId",
timestamp:{
$first:"$timestamp"
},
send_message:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"send_message"
]
},
1,
0
]
}
},
clicked_cancel:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"clicked_cancel"
]
},
1,
0
]
}
},
clicked_ok:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"clicked_ok"
]
},
1,
0
]
}
}
}
},
{
$project:{
date:{
$dateToString:{
format:"%Y-%m-%d",
date:"$timestamp"
}
},
userId:1,
clicked_cancel:1,
send_message:1,
clicked_ok:1
}
}
])
explanation:
keep only document for a specific day in $match stage
group doc by userId and count occurrences for each event in $group stage
finally format the timestamp field into yyyy_MM-dd format in $project stage
for the data you provided, this will output
{
"_id":"123123123",
"send_message":0,
"clicked_cancel":1,
"clicked_ok":1,
"date":"2016-10-12"
}
Check the following query
db.sandbox.aggregate([{
$group: {
_id: {
userId: "$userId",
date: {
$dateToString: { format: "%Y-%m-%d", date: "$timestamp" }}
},
send_message: {
$sum: {
$cond: { if: { $eq: ["$event_type", "send_message"] }, then: 1, else: 0 } }
},
clicked_cancel: {
$sum: {
$cond: { if: { $eq: ["$event_type", "clicked_cancel"] }, then: 1, else: 0 }
}
},
clicked_ok: {
$sum: {
$cond: { if: { $eq: ["$event_type", "clicked_ok"] }, then: 1, else: 0 }
}
}
}
}])

Need to sum from array object value in mongodb

I am trying to calculate total value if that value exits. But query is not working 100%. So can somebody help me to solve this problem. Here my sample document. I have attached two documents. Please these documents & find out best solution
Document : 1
{
"_id" : 1"),
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "11",
"saleValue": 1000
},
{
"id" : "112",
"saleValue": 1400
},
{
"id" : "22",
},
{
"id" : "234",
"saleValue": 111
}
],
},
"createdTime" : ISODate("2018-03-18T10:18:48.000Z")
}
Document : 2
{
"_id" : 444,
"message_count" : 4,
"messages" : {
"data" : [
{
"id" : "444",
"saleValue" : 2060
},
{
"id" : "444",
},
{
"id" : 234,
"saleValue" : 260
},
{
"id" : "34534",
}
]
},
"createdTime" : ISODate("2018-03-18T03:11:50.000Z")
}
Needed Output:
{
total : 4831
}
My query :
db.getCollection('myCollection').aggregate([
{
"$group": {
"_id": "$Id",
"totalValue": {
$sum: {
$sum: "$messages.data.saleValue"
}
}
}
}
])
So please if possible help me to solve this problem. Thanks in advance
It's not working correctly because it is aggregating all the documents in the collection; you are grouping on a constant "_id": "tempId", you just need to reference the correct key by adding the $ as:
db.getCollection('myCollection').aggregate([
{ "$group": {
"_id": "$tempId",
"totalValue": {
"$sum": { "$sum": "$messages.data.saleValue" }
}
} }
])
which in essence is a single stage pipeline version of an aggregate operation with an extra field that holds the sum expression before the group pipeline then calling that field as the $sum operator in the group.
The above works since $sum from MongoDB 3.2+ is available in both the $project and $group stages and when used in the $project stage, $sum returns the sum of the list of expressions. The expression "$messages.data.value" returns a list of numbers [120, 1200] which are then used as the $sum expression:
db.getCollection('myCollection').aggregate([
{ "$project": {
"values": { "$sum": "$messages.data.value" },
"tempId": 1,
} },
{ "$group": {
"_id": "$tempId",
"totalValue": { "$sum": "$values" }
} }
])
You can add a $unwind before your $group, in that way you will deconstructs the data array, and then you can group properly:
db.myCollection.aggregate([
{
"$unwind": "$messages.data"
},
{
"$group": {
"_id": "tempId",
"totalValue": {
$sum: {
$sum: "$messages.data.value"
}
}
}
}
])
Output:
{ "_id" : "tempId", "totalValue" : 1320 }
db.getCollection('myCollection').aggregate([
{
$unwind: "$messages.data",
$group: {
"_id": "tempId",
"totalValue": { $sum: "$messages.data.value" }
}
}
])
$unwind
According to description as mentioned into above question, as a solution please try executing following aggregate query
db.myCollection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path: '$messages.data'
}
},
// Stage 2
{
$group: {
_id: {
pageId: '$pageId'
},
total: {
$sum: '$messages.data.saleValue'
}
}
},
// Stage 3
{
$project: {
pageId: '$_id.pageId',
total: 1,
_id: 0
}
}
]
);
You can do it without using $group. Grouping made other data to be managed and addressed. So, I prefer using $sum and $map as shown below:
db.getCollection('myCollection').aggregate([
{
$addFields: {
total: {
$sum: {
$map: {
input: "$messages.data",
as: "message",
in: "$$message.saleValue",
},
},
},
},
},
}
])

Using mongodb $lookup on a single collection

I have a collection with documents like this
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"bsk" : {
"bskItems" : [
{
"id" : 4,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "reblochon"
}
},
{
"id" : 5,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Pinot Noir"
}
},
{
"id" : 13,
"bskItemLineType" : "PromotionItem",
"promotionApplied" : {
"bskIds" : [
4,
5
]
}
},
{
"id" : 8,
"bskItemLineType" : "SaleItem",
"product" : {
"description" : "Food"
}
},
{
"id" : 10,
"bskItemLineType" : "SubTotalItem"
},
{
"id" : 12,
"bskItemLineType" : "TenderItem"
},
{
"id" : 14,
"bskItemLineType" : "ChangeDue"
}
]
}
}
I want an output where I can see the "promotionsApplied" and the descriptions of the items they applied to. For the document above the "promotionsApplied" were to "bsk.BskItems.id" 4 and 5 so I would like the output to be:
{
"_id": xxxxx,
"promotionAppliedto : "reblochon"
},
{
"_id": xxxxx,
"promotionAppliedto : "Pinot Noir"
}
the query below:
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.product.description":1,"bsk.bskItems.id":1}},
{$unwind: "$bsk.bskItems"},
])
gets me the descriptions
db.getCollection('DSTest').aggregate([
{$project:{"bsk.bskItems.promotionApplied.bskIds":1}},
{$unwind: "$bsk.bskItems"},
{$unwind:"$bsk.bskItems.promotionApplied.bskIds"},
])
gets me the promotions applied. I was hoping to be able to use $lookup to join the two based on _id and bsk.bskItems.promotionApplied.bskIds and _id and bsk.bskItems.id, but I can't figure out how.
I don't know if you solved your problem or if this is relevant anymore but I figured out your question:
db.DSTest.aggregate([
{
$unwind: "$bsk.bskItems"
},
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
},
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
},
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
},
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
},
{
$unwind: "$product"
},
{
$match: {
"product.p": {$ne: ""}, "groupCount": { $gt: 1}
}
},
{
$project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p"
}
}
])
With the dummy document you gave this is the result I get:
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "reblochon"
}
{
"_id" : ObjectId("5773ac6a486f811694711875"),
"promotionAppliedto" : "Pinot Noir"
}
But my advise is to put some thought in your database structure next time. You had apples and pears, so we had to make an Asian pear in order to get to this result. Also from the aggregation levels you see it was not an easy job. That could have been much easier if you had separated the arrays that contained the field product from the ones that contained the field promotionApplied.
To break it down and explain what is happening step by step:
{
$unwind: "$bsk.bskItems"
}
By unwinding we are flattening our array. We need this in order to access the fields inside the array and do operations on them . More about $unwind
{
$project: {
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] },
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
},
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] },
}
}
baItId: { $ifNull: [ "$bsk.bskItems.id", 0 ] }
With this line we just make sure that every document gets an basket item id. In your case they all do, I just added it to make sure. And if some document didn't have a value for that field we set it to 0 (you can set it to -1 or whatever you want)
"bsk": {
"bskItems": {
"promotionApplied": {
"bskIds": { $ifNull: [ "$bsk.bskItems.promotionApplied.bskIds", [0] ] }
}
}
}
Here we are creating an array for the field "$bsk.bskItems.promotionApplied.bskIds". Since not all documents have this field we have to add to them all, otherwise we are comparing oranges with apples.
"product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
As said before, we have to make our documents look all alike so we also add $bsk.bskItems.product.description to the ones that don't have this field. Those who don't have the field we set it to an empty string
Now all our documents have the same structure and we can start with the actual sorting out.
{
$unwind: "$bsk.bskItems.promotionApplied.bskIds"
}
Since we want to access the ids inside $bsk.bskItems.promotionApplied.bskIds we have to unwind this array as well.
{
$project: {
baItId: 1,
proAppliedId:
{
$cond: { if: { $eq: [ "$bsk.bskItems.promotionApplied.bskIds", 0 ] }, then: "$baItId", else: "$bsk.bskItems.promotionApplied.bskIds" }
},
product: 1
}
}
baItId: 1 and product: 1, are just being passed on. The proAppliedId will contain our bsk.bskItems.promotionApplied.bskIds. If they are 0 then the get the same id as the field $baItId, otherwise they keep their id.
{
$group: {
_id: { proAppliedId: "$proAppliedId", docId: "$_id"},
product: { $push: { "p": "$product" } },
groupCount: { $sum: 1 }
}
}
Now finally we can group our documents by $proAppliedId that we created in the previous aggregation pipeline.
We also push the product values in an array. So there will be now arrays that contain two entries.
One with the value that we look for and one with an empty string because we did that in a previous aggregation pipeline "product": { $ifNull: [ "$bsk.bskItems.product.description", "" ] }
We also create a new field called groupCount to count the documents that were grouped together.
{ $project: {
_id: "$_id.docId",
"promotionAppliedto": "$product.p" } }
In the final project we just build the final document by how we want it to look like.
Hope you understand now why thinking, were and how we save things, matter.
Using document type database - it will be better to store promotion metadtaa instead of only id.
Please see attached example
"promotionApplied" : [{
bskId : 4,
name : "name",
otherData : "otherData"
}, {
bskId : 5,
name : "name5",
otherData : "otherData5"
}
]