Mongodb aggregate group by array elements - mongodb

I have a mongodb document that contains customer id, status (active, deactivate) and date.
[
{
id:1,
date:ISODate('2022-12-01'),
status:'activate'
},
{
id:2,
date:ISODate('2022-12-01'),
status:'activate'
},
{
id:1,
date:ISODate('2022-12-02'),
status:'deactivate'
},
{
id:2,
date:ISODate('2022-12-21'),
status:'deactivate'
}
]
I need to get daywise customer status count.
I came up with below aggregation.
db.collection.aggregate([
{
$addFields: {
"day": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$date"
}
}
}
},
{
$group: {
_id: "$day",
type: {
$push: "$status"
}
}
}
])
this way I can get status in a array. like below.
[
{
_id:"2022-12-01",
type:[
0:"activate",
1:"activate"
]
},
{
_id:"2022-12-02",
type:[
0:"deactivate"
]
},
{
_id:"2022-12-21",
type:[
0:"deactivate"
]
}
]
now it's working as intended. but I need the output like below.
[
{
_id:"2022-12-01",
type:{
"activate":2,
}
},
{
_id:"2022-12-02",
type:{
"deactivate":1
}
},
{
_id:"2022-12-21",
type:{
"deactivate":1
}
}
]
this table has around 100,000 documents and doing this programmatically will take about 10 seconds. that's why I'm searching a way to do this as a aggregation

One option is to group twice and then use $arrayToObject:
db.collection.aggregate([
{$group: {
_id: {day: "$date", status: "$status"},
count: {$sum: 1}
}},
{$group: {
_id: {$dateToString: {format: "%Y-%m-%d", date: "$_id.day"}},
data: {$push: {k: "$_id.status", v: "$count"}}
}},
{$project: {type: {$arrayToObject: "$data"}}}
])
See how it works on the playground example

Related

Mongodb aggregate grouping elements of an array type field

I have below data in my collection:
[
{
"_id":{
"month":"Jan",
"year":"2022"
},
"products":[
{
"product":"ProdA",
"status":"failed",
"count":15
},
{
"product":"ProdA",
"status":"success",
"count":5
},
{
"product":"ProdB",
"status":"failed",
"count":20
},
{
"product":"ProdB",
"status":"success",
"count":10
}
]
},
...//more such data
]
I want to group the elements of products array on the name of the product, so that we have record of how what was the count of failure of success of each product in each month. Every record is guaranteed to have both success and failure count each month. The output should look like below:
[
{
"_id":{
"month":"Jan",
"year":"2022"
},
"products":[
{
"product":"ProdA","status":[{"name":"success","count":5},{"name":"failed","count":15}]
},
{
"product":"ProdB","status":[{"name":"success","count":10},{"name":"failed","count":20}]
}
]
},
...//data for succeeding months
]
I have tried to do something like this:
db.collection.aggregate([{ $unwind: "$products" },
{
$group: {
"_id": {
month: "$_id.month",
year: "$_id.year"
},
products: { $push: { "product": "$product", status: { $push: { name: "$status", count: "$count" } } } }
}
}]);
But above query doesn't work.
On which level I need to group fields so as to obtain above output.
Please help me to find out what I am doing wrong.
Thank You!
Your first group stage needs to group by both the _id and the product name, aggregate a list of status counts and then another group stage which then forms the products list:
db.collection.aggregate([
{$unwind: "$products"},
{$group: {
_id: {
id: "$_id",
product: "$products.product",
},
status: {
$push: {
name: "$products.status",
count: "$products.count"
}
}
}
},
{$group: {
_id: "$_id.id",
products: {
$push: {
product: "$_id.product",
status: "$status"
}
}
}
}
])
Mongo Playground

How to use $match (multiple conditions) and $group in Mongodb

have list of records with the following fields - postBalance, agentId, createdAt, type. I want to filter by “type” and date. After this is done I want to get the $last postBalance for each agent based on the filter and sum up the postBalance. I have been struggling with this using this query
db.transaction.aggregate(
[{ $match: {
$and: [ {
createdAt: { $gte: ISODate('2022-09-15'), $lt:
('2022-09-16') } },
{ type: "CASH_OUT"}]}},
{
$group:
{
_id: {createdAt: {$last: "$createdAt"}},
totalAmount: { $sum: "$postBalance" },
}
}
]
)
An empty array is returned with this query and there are data in the collection.
Below are samples of the documents
{
"_id": {
"$oid": "6334cefd0048787d5535ff16"
},
"type": "CASH_OUT",
"postBalance": {
"$numberDecimal": "23287.625"
},
"createdAt": {
"$date": {
"$numberLong": "1664405245000"
}
},
}
{
"_id": {
"$oid": "6334d438c1ab8a577677cbf3"
},
"userID": {
"$oid": "62f27bc29f51747015fdb941"
},
"aggregatorID": "0000116",
"transactionFee": {
"$numberDecimal": "0.0"
},
"type": "AIRTIME_VTU",
"postBalance": {
"$numberDecimal": "2114.675"
},
"walletHistoryID": 613266,
"walletID": 1720,
"walletActionAt": {
"$date": {
"$numberLong": "1664406584000"
}
},
{
"type": "FUNDS_TRANSFER",
"postBalance": {
"$numberDecimal": "36566.39"
},
"createdAt": {
"$date": {
"$numberLong": "1664407090000"
}
}
}
This is the output I am expecting
{
"date" : 2022-10-09,
"CASHOUT ": 897663,088,
"FUNDS_TRANSFER": 8900877,
"AIRTIME_VTU": 8890000
}
How can my query be aggregated to get this? Thanks
It look like you want something like:
db.collection.aggregate([
{$match: {
createdAt: {
$gte: ISODate("2022-09-15T00:00:00.000Z"),
$lt: ISODate("2022-09-30T00:00:00.000Z")
}
}
},
{$group: {
_id: "$type",
createdAt: {$first: "$createdAt"},
totalAmount: {$sum: "$postBalance"}
}
},
{$group: {
_id: 0,
createdAt: {$first: "$createdAt"},
data: {$push: {k: "$_id", v: "$totalAmount"}}
}
},
{$project: {
data: {$arrayToObject: "$data"},
createdAt: 1,
_id: 0
}
},
{$set: {"data.date": "$createdAt"}},
{$replaceRoot: {newRoot: "$data"}}
])
See how it works on the playground example

MongoDB - get datewise/houlty aggregate count of column

I have set of documents in my mongoDB collection. I am looking to get datewise aggregate count of document if date range is more than a day and hourly aggregate count for same column if date query is for single day. The data may have documents with same conversationId, hence it is necessary to group with conversationId as well.Below is sample of data for reference
[
{
"_id":"c438a671-2391-4b85-815c-ecfcb3d2bb54",
"status":"INTERNAL_UPDATE",
"conversationId":"ac44781d-caab-4410-a708-9d6db8480fc3",
"messageIds":[],
"messageId":"4dc02026-ac06-4eb1-aa59-e385fcce4a36",
"responseId":"0c00c83d-61c5-4937-846c-2e6a46aae857",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-04T11:40:06.552Z",
"source":{}
},
{
"_id":"98370ddf-9ff8-4347-bab7-1f7777ab9e9d",
"status":"NEW",
"conversationId":"b5dc39d2-56a1-4eb6-a728-cdbe33dca580",
"messageIds":[],
"messageId":"ba94b839-f795-44f2-aea0-173d26006f14",
"responseId":"a2b75364-447b-4345-8008-2beccd6cbb34",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-05T11:40:30.897Z",
"source":{}
},
{
"_id":"db1eae2b-62d9-455c-ab46-dbfc5baf8b67",
"status":"INTERNAL_UPDATE",
"conversationId":"b5dc39d2-56a1-4eb6-a728-cdbe33dcb584",
"messageIds":[],
"messageId":"b83c743b-d36e-4fdd-9c03-21988af47263",
"responseId":"97198c09-0130-48dc-a225-6d0faeff3116",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-05T11:40:31.418Z",
"source":{}
},
{
"_id":"12a21495-f857-4f18-a06e-f8ba0b951ade",
"status":"NEW",
"conversationId":"8e37c704-add8-4f9f-8e70-d630c24f653b",
"messageIds":[],
"messageId":"51a48362-545c-4f9f-930b-42e4841fc974",
"responseId":"4691468b-a43b-41d1-83df-1349fb554bfa",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-06T11:43:58.174Z",
"source":{}
},
{
"_id":"4afaa735-4618-40cf-8b4f-00ee83b2c3c5",
"status":"INTERNAL_UPDATE",
"conversationId":"8e37c704-add8-4f9f-8e70-d630c24f653b",
"messageIds":[],
"messageId":"7c860126-bf1e-41b2-a7d3-6bcec3e8d5fb",
"responseId":"09cec9a1-2621-481d-b527-d98b007ef5be",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-06T11:43:58.736Z",
"source":{}
},
{
"_id":"cf8deeca-2cfd-497e-b92b-03204c84217a",
"status":"NEW",
"conversationId":"3c6870b5-88d6-4e21-8629-28137dea3fee",
"messageIds":[],
"messageId":"da84e414-2269-4812-8ddd-e2cd6c9be4fd",
"responseId":"ae1014b2-0cc1-41f0-9990-cf724ed67ab7",
"conversation":{},
"message":{},
"params":{},
"timestamp":"2021-05-06T13:37:55.060Z",
"source":{}
}
]
Presently I am able to group by conversationId, but unable to get data aggregated datewise or on hourly basis if date range is on single date.
Below is the query for same
db.documentName.aggregate([
{
'$match': {
'$and': [
{
timestamp: {
'$gte': ISODate('2021-05-01T00:00:00.000Z'),
'$lte': ISODate('2021-05-10T23:59:59.999Z')
}
},
{ 'source.author': { '$regex': 'user', '$options': 'i' } },
{},
{}
]
}
},
{ '$group': {
_id: {'conversationId': '$conversationId'} },
{ '$count': 'document_count' }
])
I have tried adding something like, $hour: '$timestamp' with comma separation beside conversationId in $group, but its of no use and is giving error.
The desired result I am trying to get for above data is, something like this
[{"date": "2021-05-04", "doc_count": 1},
{"date": "2021-05-05", "doc_count": 2},
{"date": "2021-05-06", "doc_count": 2}]
As for 2021-05-05 there are 2 docs with different conversationId, and for 2021-05-06 there are 3 docs in total but 2 documents have same conversationId hence aggregate count for 2021-05-06 is also 2. Hope this clarifies my quesiton.
The question is not entirely clear to me, but it sounds like you want something like this:
The groupId is a field to rebuild the date including the hour, or not, according to your condition:
EDIT:
db.collection.aggregate([
{$match: {
timestamp: {
$gte: ISODate("2021-05-01T00:00:00.000Z"),
$lte: ISODate("2021-05-07T23:59:59.999Z")
}
}
},
{$project: {
conversationId: 1,
groupId: {
$dateFromParts: {
year: {$year: "$timestamp"},
month: {$month: "$timestamp"},
day: {$dayOfMonth: "$timestamp"},
hour: {$cond: [
{$gte: [
{$dateDiff: {
startDate: ISODate("2021-05-01T00:00:00.000Z"),
endDate: ISODate("2021-05-07T23:59:59.999Z"),
unit: "day"}}, 1]},
0,
{$hour: "$timestamp"}]}
}
}
}
},
{$group: {_id: {conversationId: "$conversationId", groupId: "$groupId"}}},
{$group: {_id: "$_id.groupId", doc_count: {$sum: 1}}},
{$project: {date: {$toString: "$_id"}, doc_count: 1, _id: 0}}
])
See how it works on the playground example
As suggested by #nimrodserok, for mongo version 4.2.9 the query would be
db.collection.aggregate([
{
$match: {
timestamp: {
$gte: ISODate("2021-05-01T00:00:00.000Z"),
$lte: ISODate("2021-05-07T23:59:59.999Z")
}
}
},
{
$project: {
conversationId: 1,
groupId: {
$dateFromParts: {
year: {
$year: "$timestamp"
},
month: {
$month: "$timestamp"
},
day: {
$dayOfMonth: "$timestamp"
},
hour: {
$cond: [
{
$gte: [
{
$subtract: [
{
$toLong: ISODate("2021-05-07T23:59:59.999Z")
},
{
$toLong: ISODate("2021-05-01T00:00:00.000Z")
}
]
},
86400000
]
},
0,
{
$hour: "$timestamp"
}
]
}
}
}
}
},
{
$group: {
_id: {
conversationId: "$conversationId",
groupId: "$groupId"
}
}
},
{
$group: {
_id: "$_id.groupId",
doc_count: {
$sum: 1
}
}
},
{
$project: {
date: {
$toString: "$_id"
},
doc_count: 1,
_id: 0
}
}
])

Mongodb aggregation , group by items for the last 5 days

I'm trying to get the result in some form using mongodb aggregation.
here is my sample document in the collection:
[{
"_id": "34243243243",
"workType": "TESTWORK1",
"assignedDate":ISODate("2021-02-22T00:00:00Z"),
"status":"Completed",
},
{
"_id": "34243243244",
"workType": "TESTWORK2",
"assignedDate":ISODate("2021-02-21T00:00:00Z"),
"status":"Completed",
},
{
"_id": "34243243245",
"workType": "TESTWORK3",
"assignedDate":ISODate("2021-02-20T00:00:00Z"),
"status":"InProgress",
}...]
I need to group last 5 days data in an array by workType count having staus completed.
Expected result:
{_id: "TESTWORK1" , value: [1,0,4,2,3] ,
_id: "TESTWORK2" , value: [3,9,,3,5],
_id : "TESTWORK3", value: [,,,3,5]}
Here is what I'm trying to do, but not sure how to get the expected result.
db.testcollection.aggregate([
{$match:{"status":"Completed"}},
{$project: {_id:0,
assignedSince:{$divide:[{$subtract:[new Date(),$assignedDate]},86400000]},
workType:1
}
},
{$match:{"assignedSince":{"lte":5}}},
{$group : { _id:"workType", test :{$push:{day:"$assignedSince"}}}}
])
result: {_id:"TESTWORK1": test:[{5},{3}]} - here I'm getting the day , but I need the count of the workTypes on that day.
Is there any easy way to do this? Any help would be really appreciated.
Try this:
db.testcollection.aggregate([
{
$match: { "status": "Completed" }
},
{
$project: {
_id: 0,
assignedDate: 1,
assignedSince: {
$toInt: {
$divide: [{ $subtract: [new Date(), "$assignedDate"] }, 86400000]
}
},
workType: 1
}
},
{
$match: { "assignedSince": { "$lte": 5 } }
},
{
$group: {
_id: {
workType: "$workType",
assignedDate: "$assignedDate"
},
count: { $sum: 1 }
}
},
{
$group: {
_id: "$_id.workType",
values: { $push: "$count" }
}
}
]);

Mongodb Aggregation count array/set size

Here's my problem:
Model:
{ application: "abc", date: Time.now, status: "1" user_id: [ id1, id2,
id4] }
{ application: "abc", date: Time.yesterday, status: "1", user_id: [
id1, id3, id5] }
{ application: "abc", date: Time.yesterday-1, status: "1", user_id: [
id1, id3, id5] }
I need to count the unique number of user_ids in a period of time.
Expected result:
{ application: "abc", status: "1", unique_id_count: 5 }
I'm currently using the aggregation framework and counting the ids outside mongodb.
{ $match: { application: "abc" } }, { $unwind: "$users" }, { $group:
{ _id: { status: "$status"},
users: { $addToSet: "$users" } } }
My arrays of users ids are very large, so I have to iterate the dates or I'll get the maximum document limit (16mb).
I could also $group by
{ year: { $year: "$date" }, month: { $month: "$date" }, day: {
$dayOfMonth: "$date" }
but I also get the document size limitation.
Is it possible to count the set size in mongodb?
thanks
The following will return number of uniqueUsers per application. This will apply an group operation to a result of a group operation by using pipeline feature of mongodb.
{ $match: { application: "abc" } },
{ $unwind: "$users" },
{ $group: { _id: "$status", users: { $addToSet: "$users" } } },
{ $unwind:"$users" },
{ $group : {_id : "$_id", count : {$sum : 1} } }
Hopefully this will be done in an easier way in the following releases of mongo by a command which gives the size of an array under a projection. {$project: {id: "$_id", count: {$size: "$uniqueUsers"}}}
https://jira.mongodb.org/browse/SERVER-4899
Cheers
Sorry I'm a little late to the party. Simply grouping on the 'user_id' and counting the result with a trivial group works just fine and doesn't run into doc size limits.
[
{$match: {application: 'abc', date: {$gte: startDate, $lte: endDate}}},
{$unwind: '$user_id'},
{$group: {_id: '$user_id'}},
{$group: {_id: 'singleton', count: {$sum: 1}}}
];
Use $size to get the size of set.
[
{
$match: {"application": "abc"}
},
{
$unwind: "$user_id"
},
{
$group: {
"_id": "$status",
"application": "$application",
"unique_user_id": {$addToSet: "$user_id"}
}
},
{
$project:{
"_id": "$_id",
"application": "$application",
"count": {$size: "$unique_user_id"}
}
}
]