Aggregation query which returns other fields - mongodb

I have data which looks like this.
{
"badgeId" : "ventura",
"date" : ISODate("2016-12-22T21:26:40.382+0000"),
"mistakes" : NumberInt(10)
}
{
"_id" : "a4usNGibIu",
"badgeId" : "dog",
"date" : ISODate("2016-12-21T21:26:40.382+0000"),
"mistakes" : NumberInt(10)
}
{
"_id" : ObjectId("580c77801d7723f3f7fe0e77"),
"badgeId" : "dog",
"date" : ISODate("2016-11-24T21:26:41.382+0000"),
"mistakes" : NumberInt(5)
}
I need documents grouped by badgeId where the mistakes is smallest and the corresponding date
I cannot use $min, $max, $first, $last on for the date in $group because I need the date from the row where mistakes is lowest.
I tried the following query where I am using $min, but it won't give the intended result as it will pick $min of the date
db.Badges.aggregate([
{
$match: otherMatchConditions
},
{
$group: {
_id: '$badgeId',
date: {
$min: '$date'
},
mistakes: {
$min: '$mistakes'
}
}
}
])

You may sort the results by mistakes and then take the corresponding $first of mistakes and date
[
{$sort: {mistakes: 1}},
{
$group: {
_id: '$badgeId',
date: {
$first: '$date'
},
mistakes: {
$first: '$mistakes'
}
}
}
]

I think your query is right but as you see dates are invalid here.
ISODate("2016-14-22T21:26:41.382+0000") and ISODate("2016-13-22T21:26:40.382+0000"),
in both dates you can see months is 13 and 14 , which is not valid month. Below are right data after putting in mongodb.
{
"_id" : ObjectId("580ca86c9a43fad551ba801f"),
"badgeId" : "ventura",
"date" : ISODate("2016-12-22T21:26:40.382Z"),
"mistakes" : 10
}
{
"_id" : "a4usNGibIu",
"badgeId" : "dog",
"date" : ISODate("2016-12-23T21:26:40.382Z"),
"mistakes" : 10
}
{
"_id" : ObjectId("580c77801d7723f3f7fe0e77"),
"badgeId" : "dog",
"date" : ISODate("2016-12-24T21:26:40.382Z"),
"mistakes" : 5
}
When applied your query
db.getCollection('COLLECTION_NAME').aggregate([
{
$group: {
_id: '$badgeId',
date: {
$min: '$date'
},
mistakes: {
$min: '$mistakes'
}
}
}
])
I got the below result.
{
"_id" : "dog",
"date" : ISODate("2016-12-23T21:26:40.382Z"),
"mistakes" : 5
}
{
"_id" : "ventura",
"date" : ISODate("2016-12-22T21:26:40.382Z"),
"mistakes" : 10
}

Related

Get sum of a column from mongodb along with the list of another column

I have a collection "employees" with sample entries as below-
{
"_id" : ObjectId("62ccaa238a322322211"),
"employeeId" : "1234",
"date" : ISODate("2022-07-11T12:00:00.000+0000"),
"hours" : 15.0,
"createdBy" : "user1",
"createdDate" : ISODate("2023-02-19T21:54:27.213+0000"),
"updatedBy" : "user1",
"updatedDate" : ISODate("2023-02-19T21:54:27.213+0000"),
},
{
"_id" : ObjectId("62ccaa238a322388821"),
"employeeId" : "1234",
"date" : ISODate("2022-07-10T12:00:00.000+0000"),
"hours" : 25.0,
"createdBy" : "user1",
"createdDate" : ISODate("2023-02-19T22:54:27.213+0000"),
"updatedBy" : "user1",
"updatedDate" : ISODate("2023-02-19T22:54:27.213+0000"),
}
I am trying to get sum of hours for each employee along with the list of dates for those entries
{
employeeId :"1234"
hours : 40 // sum of the hours from both entries
dates : [2022-07-11, 2022-07-10] // list of `date` column
}
I tried below one but dont know how to adapt to get employeeId and sum
db.getCollection("employees").aggregate( [
{ $match : {
$and :[
{"employeeId" : "1234"},
{"date" : { $lte: new ISODate("2023-02-19") }}
]}},
{
$group:
{
"_id": {
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$date"
}
},
totalAmount: { $sum: "$hours" }
}
}
] )
You should group by employeeId instead of date.
db.collection.aggregate([
{
$group: {
_id: "$employeeId",
hours: {
$sum: "$hours"
},
dates: {
$push: {
$dateTrunc: {
date: "$date",
unit: "day"
}
}
}
}
}
])
Mongo Playground

Counting distinct number of users from beginning

I have a MongoDB aggregation pipeline that has been frustrating me for a while now, because it never seems to be accurate or correct to my needs. The aim is to count the number of new unique users each day per chatbot, starting from the very beginning.
Here's what my pipeline looks like right now.
[
{
"$project" : {
"_id" : 0,
"bot_id" : 1,
"customer_id" : 1,
"timestamp" : {
"$ifNull" : [
'$incoming_log.created_at', '$outcome_log.created_at'
]
}
}
},
{
"$project" : {
"customer_id" : 1,
"bot_id" : 1,
"timestamp" : {
"$dateFromString" : {
"dateString" : {
"$substr" : [
"$timestamp", 0, 10
]
}
}
}
}
},
{
"$group" : {
"_id" : "$customer_id",
"timestamp" : {
"$first" : "$timestamp"
},
"bot_id" : {
"$addToSet" : "$bot_id"
}
}
},
{
"$unwind" : "$bot_id"
},
{
"$group" : {
"_id" : {
"bot_id" : "$bot_id",
"customer_id" : "$_id"
},
"timestamp" : {
"$first" : "$timestamp"
}
}
},
{
"$project" : {
"_id" : 0,
"timestamp" : 1,
"customer_id" : "$_id.customer_id",
"bot_id" : "$_id.bot_id"
}
},
{
"$group" : {
"_id": {
"timestamp" : "$timestamp",
"bot_id" : "$bot_id"
},
"new_users" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"timestamp" : "$_id.timestamp",
"bot_id" : "$_id.bot_id",
"new_users" : 1
}
}
]
Some sample data for an idea of what the data looks like...
{
"mid" : "...",
"bot_id" : "...",
"bot_name" : "JOBBY",
"customer_id" : "U122...",
"incoming_log" : {
"created_at" : ISODate("2020-12-08T09:14:16.237Z"),
"event_payload" : "",
"event_type" : "text"
},
"outcome_log" : {
"created_at" : ISODate("2020-12-08T09:14:18.145Z"),
"distance" : 0.25,
"incoming_msg" : "🥺"
}
}
My expected outcome is something along the lines of:
{
"new_users" : 1187.0,
"timestamp" : ISODate("2021-01-27T00:00:00.000Z"),
"bot_id" : "5ffd......."
},
{
"new_users" : 1359.0,
"timestamp" : ISODate("2021-01-27T00:00:00.000Z"),
"bot_id" : "6def......."
}
Have I overcomplicated my pipeline somewhere? I seem to get a reasonable number of new users per bot each day, but for some reason my colleague tells me that the number is too high. I need some tips, please!
I have really no idea what you are looking for.
"The aim is to count the number of new unique users each day per chatbot, starting from the very beginning."
What is "new unique users"? What do you mean by "starting from the very beginning"? You ask for count per day but you use {"$group": {"_id": "$customer_id", "timestamp": { "$first": "$timestamp" } } }
For me your grouping does not make any sense. With only one single sample document, it is almost impossible to guess what you like to count.
Regarding group per day: I prefer to work always with Date values, rather than strings. It is less error prone. Maybe you have to consider time zones, because UTC midnight is not your local midnight. When you work with Dates then you have better control over it.
The $project stages are useless when you do $group afterwards. Typically you have only one $project stage at the end.
So, put something to start.
db.collection.aggregate([
{
$set: {
day: {
$dateToParts: {
date: { $ifNull: ["$incoming_log.created_at", "$outcome_log.created_at"] }
}
}
}
},
{
$group: {
_id: "$customer_id",
timestamp: {$min: { $dateFromParts: { year: "$day.year", month: "$day.month", day: "$day.day" } }}
}
}
]);

How to get the difference of two ISO String dates with an aggregation MongoDB query?

I have a collection called biosignals in my MongoDB, in which there exist entries that represent physical activity (e.g. walking).
Each such entry has a 'start_date_time' and an 'end_date_time', which are ISO strings (e.g. 2017-04-26T07:12:09.463Z).
I want to do the following query, where I group the physical activity entries by day and I calculate the total duration of activity for each day.
db.biosignals.aggregate([
{
$match: {
"name": "physical-activity"
}
},
{
$project: {
duration: {
"$subtract": [new Date("$end_date_time"), new Date("$start_date_time")]
},
date: {
$substr: [ "$start_date_time", 0, 10]
}
}
},
{
$group: {
_id: "$date",
total: { $sum: "$duration" }
}
},
{
$sort: {
_id: 1
}
}
])
However, I only get 0 as a result, as shown here:
{ "_id" : "2017-04-24", "total" : NumberLong(0) }
{ "_id" : "2017-04-25", "total" : NumberLong(0) }
{ "_id" : "2017-04-26", "total" : NumberLong(0) }
{ "_id" : "2017-04-27", "total" : NumberLong(0) }
If, instead, I hardcode the dates (e.g. start_date_time = 2017-04-26T07:12:08.463Z and end_date_time = 2017-04-26T07:12:09.463Z, that is one second difference), I get the expected result:
{ "_id" : "2017-04-24", "total" : NumberLong(16000) }
{ "_id" : "2017-04-25", "total" : NumberLong(3000) }
{ "_id" : "2017-04-26", "total" : NumberLong(7000) }
{ "_id" : "2017-04-27", "total" : NumberLong(12000) }
How could I fix that?
Thank you very much!

Group based on discrete date ranges

I am new to MongoDB and I've been struggling to get a specific query to work without any luck.
I have a collection with millions of documents having a date and an amount, I want to get the aggregations for specific periods of time.
For example, I want to get the count, amount summations for the periods between 1/1/2015 - 15/1/2015 and between 1/2/2015 - 15/2/2015
A sample collection is
{ "_id" : "148404972864202083547392254", "account" : "3600", "amount" : 50, "date" : ISODate("2017-01-01T12:02:08.642Z")}
{ "_id" : "148404972864202085437392254", "account" : "3600", "amount" : 50, "date" : ISODate("2017-01-03T12:02:08.642Z")}
{ "_id" : "148404372864202083547392254", "account" : "3600", "amount" : 70, "date" : ISODate("2017-01-09T12:02:08.642Z")}
{ "_id" : "148404972864202083547342254", "account" : "3600", "amount" : 150, "date" : ISODate("2017-01-22T12:02:08.642Z")}
{ "_id" : "148404922864202083547392254", "account" : "3600", "amount" : 200, "date" : ISODate("2017-02-02T12:02:08.642Z")}
{ "_id" : "148404972155502083547392254", "account" : "3600", "amount" : 30, "date" : ISODate("2017-02-7T12:02:08.642Z")}
{ "_id" : "148404972864202122254732254", "account" : "3600", "amount" : 10, "date" : ISODate("2017-02-10T12:02:08.642Z")}
for date ranges between 1/1/2017 - 10/10/2017 and 1/2/2017 - 10/2/2017 the output would be like this:
1/1/2017 - 10/1/2017 - count =3, amount summation: 170
10/2/2017 - 15/2/2017 - count =2, amount summation: 40
Is it possible to work with such different date ranges? The code would be in Java, but as an example in mongo, can someone please help me?
There must be a more elegant solution than this. Anyways you can wrap it into a function and generalize date related arguments.
First, you need to make a projection at the same time deciding into which range an item goes (note the huge $switch expression). By default, an item goes into 'null' range.
Then, you filter out results that didn't match your criteria (i.e. range != null).
The very last step is to group items by the range and make all needed calculations.
db.items.aggregate([
{ $project : {
amount : true,
account : true,
date : true,
range : {
$switch : {
branches : [
{
case : {
$and : [
{ $gte : [ "$date", ISODate("2017-01-01T00:00:00.000Z") ] },
{ $lt : [ "$date", ISODate("2017-01-10T00:00:00.000Z") ] }
]
},
then : { $concat : [
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-01-01T00:00:00.000Z") } },
{ $literal : " - " },
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-01-10T00:00:00.000Z") } }
] }
},
{
case : {
$and : [
{ $gte : [ "$date", ISODate("2017-02-01T00:00:00.000Z") ] },
{ $lt : [ "$date", ISODate("2017-02-10T00:00:00.000Z") ] }
]
},
then : { $concat : [
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-02-01T00:00:00.000Z") } },
{ $literal : " - " },
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-02-10T00:00:00.000Z") } }
] }
}
],
default : null
}
}
} },
{ $match : { range : { $ne : null } } },
{ $group : {
_id : "$range",
count : { $sum : 1 },
"amount summation" : { $sum : "$amount" }
} }
])
Based on your data it will give the following results*:
{ "_id" : "01/02/2017 - 10/02/2017", "count" : 2, "amount summation" : 230 }
{ "_id" : "01/01/2017 - 10/01/2017", "count" : 3, "amount summation" : 170 }
*I believe you have few typos in your questions, that's why the data look different.

MongoDB $sum and $avg of sub documents

I need to get $sum and $avg of subdocuments, i would like to get $sum and $avg of Channels[0].. and other channels as well.
my data structure looks like this
{
_id : ... Location : 1,
Channels : [
{ _id: ...,
Value: 25
},
{
_id: ... ,
Value: 39
},
{
_id: ..,
Value: 12
}
]
}
In order to get the sum and average of the Channels.Value elements for each document in your collection you will need to use mongodb's Aggregation processing. Further, since Channels is an array you will need to use the $unwind operator to deconstruct the array.
Assuming that your collection is called example, here's how you could get both the document sum and average of the Channels.Values:
db.example.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$_id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )
The output from your post's data would be:
{
"_id" : SomeObjectIdValue,
"documentSum" : 76,
"documentAvg" : 25.333333333333332
}
If you have more than one document in your collection then you will see a result row for each document containing a Channels array.
Solution 1: Using two groups based this example:
previous question
db.records.aggregate(
[
{ $unwind: "$Channels" },
{ $group: {
_id: {
"loc" : "$Location",
"cId" : "$Channels.Id"
},
"value" : {$sum : "$Channels.Value" },
"average" : {$avg : "$Channels.Value"},
"maximun" : {$max : "$Channels.Value"},
"minimum" : {$min : "$Channels.Value"}
}},
{ $group: {
_id : "$_id.loc",
"ChannelsSumary" : { $push :
{ "channelId" : '$_id.cId',
"value" :'$value',
"average" : '$average',
"maximun" : '$maximun',
"minimum" : '$minimum'
}}
}
}
]
)
Solution 2:
there is property i didn't show on my original question that might of help "Channels.Id" independent from "Channels._Id"
db.records.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$Channels.Id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )