How to get the difference of two ISO String dates with an aggregation MongoDB query? - mongodb

I have a collection called biosignals in my MongoDB, in which there exist entries that represent physical activity (e.g. walking).
Each such entry has a 'start_date_time' and an 'end_date_time', which are ISO strings (e.g. 2017-04-26T07:12:09.463Z).
I want to do the following query, where I group the physical activity entries by day and I calculate the total duration of activity for each day.
db.biosignals.aggregate([
{
$match: {
"name": "physical-activity"
}
},
{
$project: {
duration: {
"$subtract": [new Date("$end_date_time"), new Date("$start_date_time")]
},
date: {
$substr: [ "$start_date_time", 0, 10]
}
}
},
{
$group: {
_id: "$date",
total: { $sum: "$duration" }
}
},
{
$sort: {
_id: 1
}
}
])
However, I only get 0 as a result, as shown here:
{ "_id" : "2017-04-24", "total" : NumberLong(0) }
{ "_id" : "2017-04-25", "total" : NumberLong(0) }
{ "_id" : "2017-04-26", "total" : NumberLong(0) }
{ "_id" : "2017-04-27", "total" : NumberLong(0) }
If, instead, I hardcode the dates (e.g. start_date_time = 2017-04-26T07:12:08.463Z and end_date_time = 2017-04-26T07:12:09.463Z, that is one second difference), I get the expected result:
{ "_id" : "2017-04-24", "total" : NumberLong(16000) }
{ "_id" : "2017-04-25", "total" : NumberLong(3000) }
{ "_id" : "2017-04-26", "total" : NumberLong(7000) }
{ "_id" : "2017-04-27", "total" : NumberLong(12000) }
How could I fix that?
Thank you very much!

Related

Counting distinct number of users from beginning

I have a MongoDB aggregation pipeline that has been frustrating me for a while now, because it never seems to be accurate or correct to my needs. The aim is to count the number of new unique users each day per chatbot, starting from the very beginning.
Here's what my pipeline looks like right now.
[
{
"$project" : {
"_id" : 0,
"bot_id" : 1,
"customer_id" : 1,
"timestamp" : {
"$ifNull" : [
'$incoming_log.created_at', '$outcome_log.created_at'
]
}
}
},
{
"$project" : {
"customer_id" : 1,
"bot_id" : 1,
"timestamp" : {
"$dateFromString" : {
"dateString" : {
"$substr" : [
"$timestamp", 0, 10
]
}
}
}
}
},
{
"$group" : {
"_id" : "$customer_id",
"timestamp" : {
"$first" : "$timestamp"
},
"bot_id" : {
"$addToSet" : "$bot_id"
}
}
},
{
"$unwind" : "$bot_id"
},
{
"$group" : {
"_id" : {
"bot_id" : "$bot_id",
"customer_id" : "$_id"
},
"timestamp" : {
"$first" : "$timestamp"
}
}
},
{
"$project" : {
"_id" : 0,
"timestamp" : 1,
"customer_id" : "$_id.customer_id",
"bot_id" : "$_id.bot_id"
}
},
{
"$group" : {
"_id": {
"timestamp" : "$timestamp",
"bot_id" : "$bot_id"
},
"new_users" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"timestamp" : "$_id.timestamp",
"bot_id" : "$_id.bot_id",
"new_users" : 1
}
}
]
Some sample data for an idea of what the data looks like...
{
"mid" : "...",
"bot_id" : "...",
"bot_name" : "JOBBY",
"customer_id" : "U122...",
"incoming_log" : {
"created_at" : ISODate("2020-12-08T09:14:16.237Z"),
"event_payload" : "",
"event_type" : "text"
},
"outcome_log" : {
"created_at" : ISODate("2020-12-08T09:14:18.145Z"),
"distance" : 0.25,
"incoming_msg" : "🥺"
}
}
My expected outcome is something along the lines of:
{
"new_users" : 1187.0,
"timestamp" : ISODate("2021-01-27T00:00:00.000Z"),
"bot_id" : "5ffd......."
},
{
"new_users" : 1359.0,
"timestamp" : ISODate("2021-01-27T00:00:00.000Z"),
"bot_id" : "6def......."
}
Have I overcomplicated my pipeline somewhere? I seem to get a reasonable number of new users per bot each day, but for some reason my colleague tells me that the number is too high. I need some tips, please!
I have really no idea what you are looking for.
"The aim is to count the number of new unique users each day per chatbot, starting from the very beginning."
What is "new unique users"? What do you mean by "starting from the very beginning"? You ask for count per day but you use {"$group": {"_id": "$customer_id", "timestamp": { "$first": "$timestamp" } } }
For me your grouping does not make any sense. With only one single sample document, it is almost impossible to guess what you like to count.
Regarding group per day: I prefer to work always with Date values, rather than strings. It is less error prone. Maybe you have to consider time zones, because UTC midnight is not your local midnight. When you work with Dates then you have better control over it.
The $project stages are useless when you do $group afterwards. Typically you have only one $project stage at the end.
So, put something to start.
db.collection.aggregate([
{
$set: {
day: {
$dateToParts: {
date: { $ifNull: ["$incoming_log.created_at", "$outcome_log.created_at"] }
}
}
}
},
{
$group: {
_id: "$customer_id",
timestamp: {$min: { $dateFromParts: { year: "$day.year", month: "$day.month", day: "$day.day" } }}
}
}
]);

MongoDB get user which are new today

I am trying to find a user list which is new for day-1. I have written the query to find the users who arrived till the day before yesterday and the list of users arrived yesterday. Now I want minus those data how can I do that in a single aggregate function.
Function to get the list before yesterday
db.chat_question_logs.aggregate([
{
$match : {"createdDate":{$lte: ISODate("2020-04-29T00:00:00Z")}}
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
similarly for the day-1 is as below
db.chat_question_logs.aggregate([
{
$match : {"createdDate":{$gte: ISODate("2020-04-30T00:00:00Z"),$lte: ISODate("2020-05-01T00:00:00Z")}}
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
Result JSON are as below
/* 1 */
{
"_id" : {
"userId" : "2350202241750776"
},
"count" : 1
},
/* 2 */
{
"_id" : {
"userId" : "26291570771793121"
},
"count" : 1
},
/* 3 */
{
"_id" : {
"userId" : "2742872209107866"
},
"count" : 5
},
/* 4 */
{
"_id" : {
"userId" : "23502022417507761212"
},
"count" : 1
},
/* 5 */
{
"_id" : {
"userId" : "2629157077179312"
},
"count" : 43
}
How can I find the difference.
It sounds like what you want is to get all users created yesterday (which is the 28th in this example).
db.chat_question_logs.aggregate([
{
$match : { $and: [
{ "createdDate":{$lt: ISODate("2020-04-29T00:00:00Z")} },
{ "createdDate": {$gte: ISODate("2020-04-28T00:00:00Z") }}
] }
},
{
"$project" :
{
_id : 0,
"userInfo.userId":1
}
},
{
"$group": {
"_id": {userId:"$userInfo.userId"},"count": {$sum : 1}}
}
])
Is this what you want?
Hi found the solution which is below
I used the group and first appearance of the Id and then filter record on date which I wanted.The query is as below
db.chat_question_logs.aggregate([
{
$group:
{
_id: "$userInfo.userId",
firstApprance: { $first: "$createdDate" }
}
},
{
$match : { "firstApprance": { $gte: new ISODate("2020-05-03"), $lt: new ISODate("2020-05-05") } }
}
])

Mongodb aggregate by day and delete duplicate value

I'm trying to clean a huge database.
Sample DB :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:22:31.254Z"),
"_id" : ObjectId("5a0062170f3c330012bafe77"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-06T13:32:41.084Z"),
"_id" : ObjectId("5a0064790f3c330012baff03"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff32"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
I have a lot of duplicate value but I need to clean only by day.
To obtain this for example :
{
"_id" : ObjectId("59fc5249d5ab401d99f3de7f"),
"addedAt" : ISODate("2017-11-03T11:26:01.744Z"),
"__v" : 0,
"check" : 17602,
"lastCheck" : ISODate("2018-04-05T11:47:00.609Z"),
"tracking" : [
{
"timeCheck" : ISODate("2017-11-06T13:17:20.861Z"),
"_id" : ObjectId("5a0060e00f3c330012bafe39"),
"rank" : 2395,
},
{
"timeCheck" : ISODate("2017-11-06T13:27:40.551Z"),
"_id" : ObjectId("5a00634c0f3c330012bafebe"),
"rank" : 2379,
},
{
"timeCheck" : ISODate("2017-11-07T13:37:51.012Z"),
"_id" : ObjectId("5a0065af0f3c330012baff34"),
"rank" : 2379,
}]
}
How can I aggregate by day and after delete last value duplicate?
I need to keep the values per day even if they are identical with another day.
The aggregation framework cannot update data at this stage. However, you can use the following aggregation pipeline in order to get the desired output and then use e.g. a bulk replace to update all your documents:
db.collection.aggregate({
$unwind: "$tracking" // flatten the "tracking" array into separate documents
}, {
$sort: {
"tracking.timeCheck": 1 // sort by timeCheck to allow us to use the $first operator in the next stage reliably
}
}, {
$group: {
_id: { // group by
"_id": "$_id", // "_id" and
"rank": "$tracking.rank", // "rank" and
"date": { // the "date" part of the "timeCheck" field
$dateFromParts : {
year: { $year: "$tracking.timeCheck" },
month: { $month: "$tracking.timeCheck" },
day: { $dayOfWeek: "$tracking.timeCheck" }
}
}
},
"doc": { $first: "$$ROOT" } // only keep the first document per group
}
}, {
$sort: {
"doc.tracking.timeCheck": 1 // restore ascending sort order - may or may not be needed...
}
}, {
$group: {
_id: "$_id._id", // merge everything again per "_id"
"addedAt": { $first: "$doc.addedAt" },
"__v": { $first: "$doc.__v" },
"check": { $first: "$doc.check" },
"lastCheck": { $first: "$doc.lastCheck" },
"tracking": { $push: "$doc.tracking" } // in order to join the tracking values into an array again
}
})

Aggregation query which returns other fields

I have data which looks like this.
{
"badgeId" : "ventura",
"date" : ISODate("2016-12-22T21:26:40.382+0000"),
"mistakes" : NumberInt(10)
}
{
"_id" : "a4usNGibIu",
"badgeId" : "dog",
"date" : ISODate("2016-12-21T21:26:40.382+0000"),
"mistakes" : NumberInt(10)
}
{
"_id" : ObjectId("580c77801d7723f3f7fe0e77"),
"badgeId" : "dog",
"date" : ISODate("2016-11-24T21:26:41.382+0000"),
"mistakes" : NumberInt(5)
}
I need documents grouped by badgeId where the mistakes is smallest and the corresponding date
I cannot use $min, $max, $first, $last on for the date in $group because I need the date from the row where mistakes is lowest.
I tried the following query where I am using $min, but it won't give the intended result as it will pick $min of the date
db.Badges.aggregate([
{
$match: otherMatchConditions
},
{
$group: {
_id: '$badgeId',
date: {
$min: '$date'
},
mistakes: {
$min: '$mistakes'
}
}
}
])
You may sort the results by mistakes and then take the corresponding $first of mistakes and date
[
{$sort: {mistakes: 1}},
{
$group: {
_id: '$badgeId',
date: {
$first: '$date'
},
mistakes: {
$first: '$mistakes'
}
}
}
]
I think your query is right but as you see dates are invalid here.
ISODate("2016-14-22T21:26:41.382+0000") and ISODate("2016-13-22T21:26:40.382+0000"),
in both dates you can see months is 13 and 14 , which is not valid month. Below are right data after putting in mongodb.
{
"_id" : ObjectId("580ca86c9a43fad551ba801f"),
"badgeId" : "ventura",
"date" : ISODate("2016-12-22T21:26:40.382Z"),
"mistakes" : 10
}
{
"_id" : "a4usNGibIu",
"badgeId" : "dog",
"date" : ISODate("2016-12-23T21:26:40.382Z"),
"mistakes" : 10
}
{
"_id" : ObjectId("580c77801d7723f3f7fe0e77"),
"badgeId" : "dog",
"date" : ISODate("2016-12-24T21:26:40.382Z"),
"mistakes" : 5
}
When applied your query
db.getCollection('COLLECTION_NAME').aggregate([
{
$group: {
_id: '$badgeId',
date: {
$min: '$date'
},
mistakes: {
$min: '$mistakes'
}
}
}
])
I got the below result.
{
"_id" : "dog",
"date" : ISODate("2016-12-23T21:26:40.382Z"),
"mistakes" : 5
}
{
"_id" : "ventura",
"date" : ISODate("2016-12-22T21:26:40.382Z"),
"mistakes" : 10
}

mongodb aggregation find min value and other fields in nested array

Is it possible to find in a nested array the max date and show its price then show the parent field like the actual price.
The result I want it to show like this :
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"actualPrice":19500,
"lastModifDate" :ISODate("2015-05-04T22:53:50.583Z"),
"price":"16000"
}
The data :
db.adds.findOne()
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"addTitle" : "Clio pack luxe",
"actualPrice" : 19500,
"fistModificationDate" : ISODate("2015-05-03T22:00:00Z"),
"addID" : "1746540",
"history" : [
{
"price" : 18000,
"modifDate" : ISODate("2015-05-04T22:01:47.272Z"),
"_id" : ObjectId("5547ec4bfeb20b0414e8e51b")
},
{
"price" : 16000,
"modifDate" : ISODate("2015-05-04T22:53:50.583Z"),
"_id" : ObjectId("5547f87e83a1dae00bc033fa")
},
{
"price" : 19000,
"modifDate" : ISODate("2015-04-04T22:53:50.583Z"),
"_id" : ObjectId("5547f87e83a1dae00bc033fe")
}
],
"__v" : 1
}
my query
db.adds.aggregate(
[
{ $match:{addID:"1746540"}},
{ $unwind:"$history"},
{ $group:{
_id:0,
lastModifDate:{$max:"$historique.modifDate"}
}
}
])
I dont know how to include other fields I used $project but I get errors
thanks for helping
You could try the following aggregation pipeline which does not need to make use of the $group operator stage as the $project operator takes care of the fields projection:
db.adds.aggregate([
{
"$match": {"addID": "1746540"}
},
{
"$unwind": "$history"
},
{
"$project": {
"actualPrice": 1,
"lastModifDate": "$history.modifDate",
"price": "$history.price"
}
},
{
"$sort": { "lastModifDate": -1 }
},
{
"$limit": 1
}
])
Output
/* 1 */
{
"result" : [
{
"_id" : ObjectId("5547e45c97d8b2c816c994c8"),
"actualPrice" : 19500,
"lastModifDate" : ISODate("2015-05-04T22:53:50.583Z"),
"price" : 16000
}
],
"ok" : 1
}