Grouping MongoDB Data by Ignition Field Values: A Solution for Large Datasets

Grouping MongoDB Data by Ignition Field Values: A Solution for Large Datasets - mongodb

I want to group the data based on the values of the "ignition" field. If the "ignition" value is 1, all records with the value 1 should be grouped together until the next value of 0 is encountered, and so on.
I have 86400 records in MongoDB, and I want to query the data to achieve the desired output.
The data looks like this:
[
{
ignition: 1,
time: 112
},
{
ignition: 1,
time: 193
},
{
ignition: 0,
time: 115
},
{
ignition: 1,
time: 116
},
{
ignition: 1,
time: 117
},
{
ignition: 1,
time: 118
},
{
ignition: 0,
time: 119
},
{
ignition: 1,
time: 120
},
{
ignition: 1,
time: 121
},
{
ignition: 1,
time: 122
},
{
ignition: 0,
time: 123
},
]
I want the output like this:
{
time: [112,193],
time: [116,117,118],
time: [120,121,122]
}

db.collection.aggregate([
{
$setWindowFields: { //6. the output of this stage is, each set of adjacent documents having same $ignition will have a unique groupNum
partitionBy: null,
sortBy: {time: 1}, //4. from all documents sorted by $time
output: {
"groupNum": { //1. create a new field groupNum
$sum: { //2. by cumulatively adding
$cond: [
{$eq: ["$ignition",1]}, 0, 1 //3. modified $ignition field
]
},
window: {
documents: ["unbounded","current"] //5. starting from the beginning to current document
}
}
}
}
},
{
$match: {"ignition": 1} //7. retain $ignition : 1
},
{
$group: {
_id: "$groupNum", //8. group by groupNum
time: {$push: "$time"} //9. pushing the time to an array
}
},
{
$sort: {_id: 1} //10.sort as necessary
}
])
Demo

Related

How can I generate report from collection on daily, weekly and monthly basis MongoDB?

This is the structure of my collection
{"_id":{
"$oid":"61a5f45e7556f5670e50bd25"
},
"agent_id":"05046630001",
"c_id":null,
"agentName":"Testing",
"agent_intercom_id":"4554",
"campaign":[
"Campaig227"
],
"first_login_time":"28-12-2021 10":"55":42 AM,
"last_logout_time":"21-01-2022 2":"20":10 PM,
"parent_id":4663,
"total_call":2,
"outbound_call":1,
"iinbound_call":1,
"average_call_handling_time":56,
"logged_in_duration":2,
"total_in_call_time":30,
"total_break_duration":10,
"total_ring_time":2,
"available_time":40,
"ideal_time":0,
"occupancy":0,
"inbound_calls_missed":0,
"created_at":{
"$date":"2021-11-29T18:30:00.000Z"
}
}
I want to generate monthly result like this:
Agent
Campaign
Total call
Outgoing
Incoming
Average Call
Total Time
Idle Time
Agent 1
Campaig227
148
38
62
12:00:18
12:46:45
0:23:57
Agent 2
Campaig227
120
58
62
16:00:18
16:46:45
0:23:57
and daily report like:
Agent
Date
Campaign
Total call
Outgoing
Incoming
Average Call
Total Time
Idle Time
Agent 1
1/1/22
Campaig2
14
10
4
4:00:18
4:46:45
0:46:26
Agent 1
2/1/22
Campaig2
24
15
9
10:00:18
9:46:45
0:15:26
Agent 2
1/1/22
Campaig1
16
10
6
4:00:18
4:46:45
0:46:26
Agent 2
2/1/22
Campaig1
30
15
15
10:00:18
9:46:45
0:15:26
Please note that this is only sample data; the actual figure is different.
I tried to do this using aggregate and Pipeline but as I am new to MongoDB so find difficulty in generating query.

On proposal would be this one:
db.collection.aggregate([
{
$group: {
_id: {
agent_id: "$agent_id",
campaign: "$campaign",
date: {
$dateTrunc: {
date: "$created_at",
unit: "week",
timezone: "Europe/Zurich",
startOfWeek: "monday"
}
}
},
"Total call": { $sum: "$total_in_call_time" },
Outgoing: { $sum: "$outbound_call" },
Incoming: { $sum: "$iinbound_call" },
"Average Call": { $avg: "$total_in_call_time" },
"Total Time": { $sum: "$total_call" },
"Idle Time": { $sum: "$ideal_time" }
}
},
{
$set: {
"Average Call": { $dateToString: { date: { $toDate: { $multiply: ["$Average Call", 1000] } }, format: "%H:%M:%S" } },
"Total Time": { $dateToString: { date: { $toDate: { $multiply: ["$Total Time", 1000] } }, format: "%H:%M:%S" } },
"Idle Time": { $dateToString: { date: { $toDate: { $multiply: ["$Idle Time", 1000] } }, format: "%H:%M:%S" } }
}
},
{ $replaceWith: { $mergeObjects: ["$_id", "$$ROOT"] } },
{ $unset: "_id" }
])
Note, $dateToString: {format: "%H:%M:%S"} works for periods up to 24 hours.
Mongo Playground

Creating objects from array of objects in a grouped mongo aggregation

I have been writing an aggregation pipeline to show a summarized version of data from a collection.
Sample Structure of Document:
{
_id: 'abcxyz',
eventCode: 'EVENTCODE01',
eventName: 'SOMEEVENT',
units: 1,
rate: 2,
cost: 2,
distribution: [
{
startDate: 2021-05-31T04:00:00.000+00:00
units: 1
}
]
}
I have grouped it and merged the distribution into a single list with $unwind step before $group:
[
$unwind: {
path: '$distribution',
preserveNullAndEmptyArrays: false
},
$group: {
_id: {
eventName: '$eventName',
eventCode: '$eventCode'
},
totalUnits: {
$sum: '$units'
},
distributionList: {
$push: '$distribution'
},
perUnitRate: {
$avg: '$rate'
},
perUnitCost: {
$avg: '$cost'
}
}
]
Sample Output:
{
_id: {
eventName: 'EVENTNAME101'
eventCode: 'QQQ'
},
totalUnits: 7,
perUnitRate: 2,
perUnitCost: 2,
distributionList: [
{
startDate: 2021-05-31T04:00:00.000+00:00,
units: 1
},
{
startDate: 2021-05-31T04:00:00.000+00:00,
units: 1
},
{
startDate: 2021-06-07T04:00:00.000+00:00,
units: 1
}
]
}
I'm getting stuck at the next step; I want to consolidate the distributionList into a new List with no repeating startDate.
Example: Since first 2 objects of distributionList have the same startDate, it should be a single object in output with sum of units:
Expected:
{
_id: {
eventName: 'EVENTNAME101'
eventCode: 'QQQ'
},
totalUnits: 7,
perUnitRate: 2,
perUnitCost: 2,
newDistributionList: [
{
startDate: 2021-05-31T04:00:00.000+00:00,
units: 2 //units summed for first 2 objects
},
{
startDate: 2021-06-07T04:00:00.000+00:00,
units: 1
}
]
}
I couldn't use $unwind or $bucket as I intend to keep the grouping I did in previous steps ($group).
Can I get suggestions or a different approach if this doesn't seem accurate?

You may want to do the first $group at eventName, eventCode, distribution.startDate level. Then, you can $group again at eventName, eventCode level and using $first to keep your original $group fields.
Here is the Mongo Playground to show the idea for your reference.

Mongodb aggregate $gt returns non matching records

I would like to count the sum of a field in my database.
I have this pipeline in mongodb:
{
match: {
'user1': user.id,
'unreadMessagesCount': { $exists: true, $gt: 0 },
}
},
{
group: {
objectId: null,
total: { $sum: "$unreadMessagesCount" },
count: { $sum: 1 }
}
}
The results returned is
{total: 3, count: 30}
The total is correct because I only have 3 records with 1 unreadMessagesCount each. But the count returned is 30 which is wrong. There should only be 3 records matched. When i remove group from pipeline, I get 30 records.

How to aggregate data which an array field sum is between two values?

I have two values which are minCount and maxCount.
In my model I have field which is called counts.Something like this.
{
createdAt: date
counts: [ 0,200,100] ==> Sum 300
},
{
createdAt: date
counts: [ 200,500,0] ==> Sum 700
},
{
createdAt: date
counts: [ 0,1100,100] ==> Sum 1200
},
I need to return sum of counts which sum of counts array elements are between minCount and MaxCount.
Exm:
minCount= 400
maxCount= 1300
Return
{
createdAt: date
total: 700
},
{
createdAt: date
total: 1200
},
I
I have createdAt dates between two dates like this in first step of pipe.
Record.aggregate ([
{
$match: {
createdAt: {
$gte: new Date (req.body.startDate),
$lte: new Date (req.body.endDate),
},
},
},
{}, ==> I have to get total counts with condition which I could not here.
])
I am almost new to aggreagate pipeline so please help.

Working example - https://mongoplayground.net/p/I6LOLhTA-yA
db.collection.aggregate([
{
"$project": {
"counts": 1,
"createdAt": 1,
"totalCounts": {
"$sum": "$counts"
}
}
},
{
"$match": {
"totalCounts": {
"$gte": 400,
"$lte": 1300
}
}
}
])

How to retrieve the bucket boundaries when using mongo's $bucket aggregation?

I have a collection of 200,000+ records which include a float field amountAwarded (eg. 12345.67, 2342, 22 etc). I'm using MongoDB to aggregate these into buckets based on the following boundaries:
amountAwarded: [
{
$bucket: {
groupBy: '$amountAwarded',
boundaries: [0, 10000, 50000, 100000, 1000000, Infinity],
output: {
count: { $sum: 1 }
}
}
}
]
This works as expected and I get this output:
{
"amountAwarded": [
{
_id: 0,
count: 269
},
{
_id: 10000,
count: 67
},
// etc
]
}
What I really want is to reference the bucket boundaries in the output, eg:
{
"amountAwarded": [
{
_id: 0,
count: 269,
lowerBound: 0,
upperBound: 9999
}
]
}
This means I can construct a list on the frontend showing the buckets (eg. £0 - £9999).
The closest I've come is adding $min: "$amountAwarded" (and an equivalent $max) to the output, which gives me the upper/lower values for that field in the bucketed records. This isn't right though as the numbers are obviously from the data (eg. 8762) rather than the bucket bounds.
Is it possible to refer to the matched bucket boundaries inside the aggregation pipeline, or will I have to construct this manually after the facet is complete?

You define your boundaries yourself, so you can just addFields on the next stage using combination of indexOfArray and arrayElemAt.
Something like this:
db.collection.aggregate([
{
$bucket: {
groupBy: '$amountAwarded',
boundaries: [0, 10000, 50000, 100000, 1000000, Infinity],
output: {
count: { $sum: 1 }
}
}
},
{ $addFields: {
lowerBound: "$_id",
upperBound: { $arrayElemAt: [
[0, 10000, 50000, 100000, 1000000, Infinity],
{ $add: [
{ $indexOfArray: [
[0, 10000, 50000, 100000, 1000000, Infinity], "$_id"
] },
1
] }
] }
} }
])