How to group multiple operations with using MongoDB aggregation - mongodb

Given the following Data:
> db.users.find({}, {name: 1, createdAt: 1, updatedAt: 1}).limit(5).pretty()
{
"_id" : ObjectId("5ec8f74f32973c7b7cb7cce9"),
"createdAt" : ISODate("2020-05-23T10:13:35.012Z"),
"updatedAt" : ISODate("2020-08-20T13:37:09.861Z"),
"name" : "Patrick Jere"
}
{
"_id" : ObjectId("5ec8ef8a2b6e5f78fa20443c"),
"createdAt" : ISODate("2020-05-23T09:40:26.089Z"),
"updatedAt" : ISODate("2020-07-23T07:54:01.833Z"),
"name" : "Austine Wiga"
}
{
"_id" : ObjectId("5ed5e1a3962a3960ad85a1a2"),
"createdAt" : ISODate("2020-06-02T05:20:35.090Z"),
"updatedAt" : ISODate("2020-07-29T14:02:52.295Z"),
"name" : "Biasi Phiri"
}
{
"_id" : ObjectId("5ed629ec6d87382c608645d9"),
"createdAt" : ISODate("2020-06-02T10:29:00.204Z"),
"updatedAt" : ISODate("2020-06-02T10:29:00.204Z"),
"name" : "Chisambwe Kalusa"
}
{
"_id" : ObjectId("5ed8d21f42bc8115f67465a8"),
"createdAt" : ISODate("2020-06-04T10:51:11.546Z"),
"updatedAt" : ISODate("2020-06-04T10:51:11.546Z"),
"name" : "Wakun Moyo"
}
...
Sample Data
I use the following query to return new_users by months:
db.users.aggregate([
{
$group: {
_id: {$dateToString: {format: '%Y-%m', date: '$createdAt'}},
new_users: {
$sum: {$ifNull: [1, 0]}
}
}
}
])
example result:
[
{
"_id": "2020-06",
"new_users": 125
},
{
"_id": "2020-07",
"new_users": 147
},
{
"_id": "2020-08",
"new_users": 43
},
{
"_id": "2020-05",
"new_users": 4
}
]
and this query returns new_users, active_users and total users for a specific month.
db.users.aggregate([
{
$group: {
_id: null,
new_users: {
$sum: {
$cond: [{
$gte: ['$createdAt', ISODate('2020-08-01')]
}, 1, 0]
}
},
active_users: {
$sum: {
$cond: [{
$gt: ['$updatedAt', ISODate('2020-02-01')]
}, 1, 0]
}
},
total_users: {
$sum: {$ifNull: [1, 0]}
}
}
}
])
How can I get the second query to return results by months just like in the first query?
expected results based on one month filter:
[
{ _id: '2020-09', new_users: 0, active_users: 69},
{ _id: '2020-08', new_users: 43, active_users: 219},
{ _id: '2020-07', new_users: 147, active_users: 276},
{ _id: '2020-06', new_users: 125, active_users: 129},
{ _id: '2020-05', new_users: 4, active_users: 4}
]

You can try below aggregation.
Count new users followed by look up to count the active users for the time window for each year month.
db.users.aggregate([
{"$group":{
"_id":{"$dateFromParts":{"year":{"$year":"$createdAt"},"month":{"$month":"$createdAt"}}},
"new_users":{"$sum":1}
}},
{"$lookup":{
"from":"users",
"let":{"end_date":"$_id", "start_date":{"$dateFromParts":{"year":{"$year":"$_id"},"month":{"$subtract":[{"$month":"$_id"},1]}}}},
"pipeline":[
{"$match":{"$expr":
{"$and":[{"$gte":[
"$updatedAt",
"$$start_date"
]}, {"$lt":[
"$updatedAt",
"$$end_date"
]}]}
}},
{"$count":"activeUserCount"}
],
"as":"activeUsers"
}},
{"$project":{
"year-month":{"$dateToString":{"format":"%Y-%m","date":"$_id"}},
"new_users":1,
"active_users":{"$arrayElemAt":["$activeUsers.activeUserCount", 0]},
"_id":0
}}])

You can do the same, that you did in first query, group by cteatedAt, no need to use $ifNull operator in total_users,
Playground
Updated,
use $facet group by month and count for both counts
$project to concat both arrays using $concatArrays
$unwind deconstruct array root
$group by month and merge both month and count
Playground

Related

MongoDB : not able to get the field 'name' which has the max value in the two similar sub-documents

I have a test collection:
{
"_id" : ObjectId("5exxxxxx03"),
"username" : "abc",
"col1" : [
{
"colId" : 1
"col2" : [
{
"name" : "a",
"value" : 10
},
{
"name" : "b",
"value" : 20
},
{
"name" : "c",
"value" : 30
}
],
"col3" : [
{
"name" : "d",
"value" : 15
},
{
"name" : "e",
"value" : 25
},
{
"name" : "f",
"value" : 35
}
]
}
]
}
col1 has the list of sub-documents col2 and col3, which are similar, but convey different meanings. These two sub-documents are having name and value as fields.
Now, I need to find the max value from col2 or col3 and its corresponding name.
I tried the below query:
db.test.aggregate([
{$unwind: '$col1'},
{$unwind: '$col1.col2'},
{$unwind: '$col1.col3'},
{$group:
{_id: '$col1.colId',
maxCol2: {$max: '$col1.col2.value'},
maxCol3: {$max: '$col1.col3.value'}}},
{$project:
{maxValue: {$max: ['$maxCol2', '$maxCol3']},
name: {$cond: [
{$eq: ['$maxValue', '$maxCol2']},
'$col1.col2.name',
'$col1.col3.name']}}}]).pretty()
But, it resulted in the following, without name field in it:
{ "_id" : 1, "maxValue" : 35 }
So, just to check, weather my condition is correct or not, tried the following query ($col1.col2.name and $col1.col3.name replaced with 111 and 222 strings):
db.test.aggregate([
{$unwind: '$col1'},
{$unwind: '$col1.col2'},
{$unwind: '$col1.col3'},
{$group:
{_id: '$col1.colId',
maxCol2: {$max: '$col1.col2.value'},
maxCol3: {$max: '$col1.col3.value'}}},
{$project:
{maxValue: {$max: ['$maxCol2', '$maxCol3']},
name: {$cond: [
{$eq: ['$maxValue', '$maxCol2']},
'111',
'222']}}}]).pretty()
Which gives me the expected output:
{ "_id" : 1, "maxValue" : 35, "name" : "222" }
Could any one guide me why I am not getting the correct answer and how should I query this to get the correct output?
The correct out should be:
{ "_id" : 1, "maxValue" : 35, "name" : "f" }
P.S. - I'm a beginner.
You can use below aggregation
db.collection.aggregate([
{ "$project": {
"col1": {
"$max": {
"$reduce": {
"input": "$col1",
"initialValue": [],
"in": {
"$concatArrays": [
"$$this.col2",
"$$value",
"$$this.col3"
]
}
}
}
}
}}
])
MongoPlayground
Try this one:
Explanation
We need to add extra fields with col2 and col3 values. Once we calculate max value, we retrieve name based on max value.
db.collection.aggregate([
{
$unwind: "$col1"
},
{
$unwind: "$col1.col2"
},
{
$unwind: "$col1.col3"
},
{
$group: {
_id: "$col1.colId",
maxCol2: {
$max: "$col1.col2.value"
},
maxCol3: {
$max: "$col1.col3.value"
},
col2: {
$addToSet: "$col1.col2"
},
col3: {
$addToSet: "$col1.col3"
}
}
},
{
$project: {
maxValue: {
$filter: {
input: {
$cond: [
{
$gt: [
"$maxCol2",
"$maxCol3"
]
},
"$col2",
"$col3"
]
},
cond: {
$eq: [
"$$this.value",
{
$cond: [
{
$gt: [
"$maxCol2",
"$maxCol3"
]
},
"$maxCol2",
"$maxCol3"
]
}
]
}
}
}
}
},
{
$unwind: "$maxValue"
},
{
$project: {
_id: 1,
maxValue: "$maxValue.value",
name: "$maxValue.name"
}
}
])
MongoPlayground | Merging col2 / col3 | Per document

MongoDB- arrays from aggregation result

I have the following MongoDB query:
db.my_collection.aggregate([
{
$group: {"_id":"$day", count: { $sum: "$myValue" }
}}])
It returns the following result:
{
"_id" : ISODate("2020-02-10T00:00:00.000+01:00"),
"count" : 10
},
{
"_id" : ISODate("2020-02-01T00:00:00.000+01:00"),
"count" : 2
}
Is it possible to make two arrays from this result as below?
{
"days": [ISODate("2020-02-10T00:00:00.000+01:00"), ISODate("2020-02-01T00:00:00.000+01:00")],
"values": [10, 2]
}
Yes, just add another $group stage:
db.my_collection.aggregate([
{
$group: {
"_id": "$day", count: {$sum: "$myValue"}
}
},
{
$group: {
"_id": null,
days: {$push: "$_id"},
values: {$push: "$count"}
}
}
])

How can i count total documents and also grouped counts simultanously in mongodb aggregation?

I have a dataset in mongodb collection named visitorsSession like
{ip : 192.2.1.1,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.3.1.8,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.5.1.4,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.8.1.7,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.1.1.3,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'}
I am using this mongodb aggregation
[{$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}}, {$group: {
_id : "$country",
totalSessions : {
$sum: 1
}
}}, {$project: {
_id : 0,
country : "$_id",
totalSessions : 1
}}, {$sort: {
country: -1
}}]
using above aggregation i am getting results like this
[{country : 'US',totalSessions : 3},{country : 'UK',totalSessions : 2}]
But i also total visitors also along with result like totalVisitors : 5
How can i do this in mongodb aggregation ?
You can use $facet aggregation stage to calculate total visitors as well as visitors by country in a single pass:
db.visitorsSession.aggregate( [
{
$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}
},
{
$facet: {
totalVisitors: [
{
$count: "count"
}
],
countrySessions: [
{
$group: {
_id : "$country",
sessions : { $sum: 1 }
}
},
{
$project: {
country: "$_id",
_id: 0,
sessions: 1
}
}
],
}
},
{
$addFields: {
totalVisitors: { $arrayElemAt: [ "$totalVisitors.count" , 0 ] },
}
}
] )
The output:
{
"totalVisitors" : 5,
"countrySessions" : [
{
"sessions" : 2,
"country" : "UK"
},
{
"sessions" : 3,
"country" : "US"
}
]
}
You could be better off with two queries to do this.
To save the two db round trips following aggregation can be used which IMO is kinda verbose (and might be little expensive if documents are very large) to just count the documents.
Idea: Is to have a $group at the top to count documents and preserve the original documents using $push and $$ROOT. And then before other matches/filter ops $unwind the created array of original docs.
db.collection.aggregate([
{
$group: {
_id: null,
docsCount: {
$sum: 1
},
originals: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$originals"
},
{ $match: "..." }, //and other stages on `originals` which contains the source documents
{
$group: {
_id: "$originals.country",
totalSessions: {
$sum: 1
},
totalVisitors: {
$first: "$docsCount"
}
}
}
]);
Sample O/P: Playground Link
[
{
"_id": "UK",
"totalSessions": 2,
"totalVisitors": 5
},
{
"_id": "US",
"totalSessions": 3,
"totalVisitors": 5
}
]

MongoDB aggregation: average sales per hour

I have a collection with sales. Now I need to get the average number of sales per hour within a date range.
Up to now I have a query like this:
db.getCollection('sales').aggregate({
"$match": {
$and: [
{ "createdAt": { $gte: ISODate("2018-05-01T00:00:00.000Z") } },
{ "createdAt": { $lt: ISODate("2018-10-30T23:59:00.000Z") } },
]
}
},{
"$project": {
"h":{"$hour":"$createdAt"},
}
},{
"$group":{
"_id": "$h",
"salesPerHour": { $sum: 1 },
},
},{
"$sort": { "salesPerHour": -1 }
});
The result looks like this: {"_id" : 15, "salesPerHour" : 681.0}
How can I get the average value of salesPerHour instead the sum?
Update 1 => Example document.
{
"_id" : "pX6jj7j4274J9xpSA",
"idFiscalSale" : "48",
"documentYear" : "2018",
"paymentType" : "cash",
"cashReceived" : 54,
"items" : [...],
"customer" : null,
"subTotal" : 23.89,
"taxTotal" : 3.7139,
"total" : 23.89,
"rewardPointsValue" : 0,
"rewardPointsEarned" : 24,
"discountValue" : 0,
"createdAt" : ISODate("2018-04-24T00:00:00.201Z")
}
You can use below aggregation query.
db.sales.aggregate([
{"$match":{
"createdAt":{
"$gte":ISODate("2018-05-01T00:00:00.000Z"),
"$lt":ISODate("2018-10-30T23:59:00.000Z")
}
}},
{"$group":{
"_id":{"$hour":"$createdAt"},
"salesPerHour":{"$sum":1}
}},
{"$group":{
"_id":null,
"salesPerHour":{"$avg":"$salesPerHour"}
}}
])
You can try below aggregation
You have to use $avg aggregation operator with the salesPerHour field
db.collection.aggregate([
{ "$match": {
"$and": [
{ "createdAt": { "$gte": ISODate("2018-05-01T00:00:00.000Z") }},
{ "createdAt": { "$lt": ISODate("2018-10-30T23:59:00.000Z") }}
]
}},
{ "$group": {
"_id": { "$hour": "$createdAt" },
"salesPerHour": {
"$avg": "$salesPerHour"
}
}}
])

using mongo aggregation how to replace the fields names [duplicate]

I have large collection of documents which represent some kind of events. Collection contains events for different userId.
{
"_id" : ObjectId("57fd7d00e4b011cafdb90d22"),
"userId" : "123123123",
"userType" : "mobile",
"event_type" : "clicked_ok",
"country" : "US",
"timestamp" : ISODate("2016-10-12T00:00:00.308Z")
}
{
"_id" : ObjectId("57fd7d00e4b011cafdb90d22"),
"userId" : "123123123",
"userType" : "mobile",
"event_type" : "clicked_cancel",
"country" : "US",
"timestamp" : ISODate("2016-10-12T00:00:00.308Z")
}
At midnight I need to run aggregation for all documents for the previous day. Documents need to aggregated in the way so I could get number of different events for particular userId.
{
"userId" : "123123123",
"userType" : "mobile",
"country" : "US",
"clicked_ok" : 23,
"send_message" : 14,
"clicked_cancel" : 100,
"date" : "2016-11-24",
}
During aggregation I need to perform two things:
calculate number of events for particular userId
add "date" text fields with date
Any help is greatly appreciated! :)
you can do this with aggregation like this :
db.user.aggregate([
{
$match:{
$and:[
{
timestamp:{
$gte: ISODate("2016-10-12T00:00:00.000Z")
}
},
{
timestamp:{
$lt: ISODate("2016-10-13T00:00:00.000Z")
}
}
]
}
},
{
$group:{
_id:"$userId",
timestamp:{
$first:"$timestamp"
},
send_message:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"send_message"
]
},
1,
0
]
}
},
clicked_cancel:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"clicked_cancel"
]
},
1,
0
]
}
},
clicked_ok:{
$sum:{
$cond:[
{
$eq:[
"$event_type",
"clicked_ok"
]
},
1,
0
]
}
}
}
},
{
$project:{
date:{
$dateToString:{
format:"%Y-%m-%d",
date:"$timestamp"
}
},
userId:1,
clicked_cancel:1,
send_message:1,
clicked_ok:1
}
}
])
explanation:
keep only document for a specific day in $match stage
group doc by userId and count occurrences for each event in $group stage
finally format the timestamp field into yyyy_MM-dd format in $project stage
for the data you provided, this will output
{
"_id":"123123123",
"send_message":0,
"clicked_cancel":1,
"clicked_ok":1,
"date":"2016-10-12"
}
Check the following query
db.sandbox.aggregate([{
$group: {
_id: {
userId: "$userId",
date: {
$dateToString: { format: "%Y-%m-%d", date: "$timestamp" }}
},
send_message: {
$sum: {
$cond: { if: { $eq: ["$event_type", "send_message"] }, then: 1, else: 0 } }
},
clicked_cancel: {
$sum: {
$cond: { if: { $eq: ["$event_type", "clicked_cancel"] }, then: 1, else: 0 }
}
},
clicked_ok: {
$sum: {
$cond: { if: { $eq: ["$event_type", "clicked_ok"] }, then: 1, else: 0 }
}
}
}
}])