MongoDB - Get aggregated difference between two date fields - mongodb

I have one collection called lists with following fields:
{ "_id" : ObjectId("5a7c9f60c05d7370232a1b73"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:40:10Z") }
{ "_id" : ObjectId("5a7c9f85c05d7370232a1b74"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:41:10Z") }
{ "_id" : ObjectId("5a7c9f89c05d7370232a1b75"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:42:10Z") }
{ "_id" : ObjectId("5a7c9f8cc05d7370232a1b76"), "created_date" : ISODate("2018-11-10T04:40:11Z"), "processed_date" : ISODate("2018-11-10T04:42:20Z") }
I need to find out aggregated result in the following format (the difference between processed_date and created_date):
[{
"30Sec":count_for_diffrence_1,
"<=60Sec":count_for_diffrence_2,
"<=90Sec":count_for_diffrence_3
}]
One more thing if we can find out how may item took 30 sec, 60 sec and so on, also make sure that the result for <=60 Sec should not come in <=90Sec.
Any help will be appreciated.

You can try below aggregation query in 3.6 version.
$match with $expr to limit the documents where the time difference is 90 or less seconds.
$group with $sum to count different time slices occurences.
db.collection.aggregate([
{"$match":{"$expr":{"$lte":[{"$subtract":["$processed_date","$created_date"]},90000]}}},
{"$group":{
"_id":null,
"30Sec":{"$sum":{"$cond":{"if":{"$eq":[{"$subtract":["$processed_date","$created_date"]},30000]},"then":1,"else":0}}},
"<=60Sec":{"$sum":{"$cond":{"if":{"$lte":[{"$subtract":["$processed_date","$created_date"]},60000]},"then":1,"else":0}}},
"<=90Sec":{"$sum":{"$cond":{"if":{"$lte":[{"$subtract":["$processed_date","$created_date"]},90000]},"then":1,"else":0}}}
}}
])
Note if the created date is greater than processed date you may want to add a condition to look only for values where difference is between 0 and your requested time slice.
Something like
{$and:[{"$gte":[{"$subtract":["$processed_date","$created_date"]},0]}, {"$lte":[{"$subtract":["$processed_date","$created_date"]},60000]}]}

Related

MongoDB Aggregation - Buckets Boundaries to Referenced Array

To whom this may concern:
I would like to know if there is some workaround in MongoDB to set the "boundaries" field of a "$bucket" aggregation pipeline stage to an array that's already in the previous aggregation stage. (Or some other aggregation pipeline that will get me the same result). I am using this data to create a histogram of a bunch of values. Rather than retrieve 1 million-or-so values, I can receive 20 buckets with their respective counts.
The previous stages of yield the following result:
{
"_id" : ObjectId("5cfa6fad883d3a9b8c6ad50a"),
"boundaries" : [ 73.0, 87.25, 101.5, 115.75, 130.0 ],
"value" : 83.58970621935025
},
{
"_id" : ObjectId("5cfa6fe0883d3a9b8c6ad5a8"),
"boundaries" : [ 73.0, 87.25, 101.5, 115.75, 130.0 ],
"value" : 97.3261380262403
},
...
The "boundaries" field for every document is a result a facet/unwind/addfield with some statistical mathematics involving "value" fields in the pipeline. Therefore, every "boundaries" field value is an array of evenly spaced values in ascending order, all with the same length and values.
The following stage of the aggregation I am trying to perform is:
$bucket: {
groupBy: "$value",
boundaries : "$boundaries" ,
default: "no_group",
output: { count: { $sum: 1 } }
}
I get the following error from the explain when I try to run this aggregation:
{
"ok" : 0.0,
"errmsg" : "The $bucket 'boundaries' field must be an array, but found type: string.",
"code" : NumberInt(40200),
"codeName" : "Location40200"
}
The result I would like to get is something like this, which is the result of a basic "$bucket" pipeline operator:
{
"_id" : 73.0, // range of [73.0,87.25)
"count" : 2 // number of documents with "value" in this range.
}, {
"_id" : 87.25, // range of [87.25,101.5)
"count" : 7 // number of documents with "value" in this range.
}, {
"_id" : 101.5,
"count" : 3
}, ...
What I know:
The JIRA documentation says
'boundaries' must be constant values (can't use "$x", but can use {$add: [4, 5]}), and must be sorted.
What I've tried:
$bucketAuto does not have a linear "granularity" setting. By default, it tries to evenly distribute the values amongst the buckets, and the bucket ranges are therefore spaced differently.
Building the constant array by retrieving the pipeline results, and then adding the constant array into the pipeline again. This is effective but inefficient and not atomic, as it creates an O(2N) time complexity. I can live with this solution if needs be.
There HAS to be a solution to this. Any workaround or alternative solutions are greatly appreciated.
Thank you for your time!

Can sorting before grouping improve query performance in Mongo using the aggregate framework?

I'm trying to aggregate data for 100 accounts for a 14-15 month period, grouping by year and month.
However, the query performance is horrible as it takes 22-27 seconds. There are currently over 15 million records in the collection and I've got an index on the match criteria and can see using explain() that the optimizer uses it.
I tried adding another index on the sort criteria in the query below and after adding the index, the query now takes over 50 seconds! This happens even after I remove the sort from the query.
I'm extremely confused. I thought because grouping can't utilize an index, that if the collection was sorted beforehand, then the grouping could be much faster. Is this assumption correct? If not, what other options do I have? I can bear the query performance to be as much as 5 seconds but nothing more than that.
//Document Structure
{
Acc: 1,
UIC: true,
date: ISODate("2015-12-01T05:00:00Z"),
y: 2015
mm: 12
value: 22.3
}
//Query
db.MyCollection.aggregate([
{ "$match" : { "UIC" : true, "Acc" : { "$in" : [1, 2, 3, ..., 99, 100] }, "date" : { "$gte" : ISODate("2015-12-01T05:00:00Z"), "$lt" : ISODate("2017-02-01T05:00:00Z") } } },
//{ "$sort" : { "UIC" : 1, "Acc" : 1, "y" : -1, "mm" : 1 } },
{ "$group" : { "_id" : { "Num" : "$Num", "Year" : "$y", "Month" : "$mm" }, "Sum" : { "$sum" : "$value" } } }
])
What I would suggest you to do is to make a script (can be in nodejs) that aggregates the data in a different collection. When you have these long queries, what's advisable is to make a different collection containing the aggregation data and query from that.
My second advice would be to create a composed index in this aggregated collection and search by regular expression. In your case I would make an index containing accountId:period. For example, for account 1, and February of 2016, The index would be something like 1:201602.
Then you would be able to perform queries using regular expressions by account and timestamp. Like as if you wanted the registers for 2016 of account 1, you could do something like:
db.aggregatedCollection.find{_id : \1:2016\})
Hope my answer was helpful

Find closest date in one query

I'm currently trying to figure out a way to find the closest date of a entry in mongoDB to the on i'm looking for.
Currently i solved the problem by using 2 queries. One using $gte and limit(1) to look for the next larger date and then $lte - limit(1) to see if there is a closer on that might be lower.
I was wondering, if there might be a way to find the closest date in just one query, but was not able to find anything on that matter.
Hope you can help me with this, or at least tell me for sure that this is the only way to do so.
db.collection.find({"time":{$gte: isoDate}}).sort({"time":1}).limit(1)
db.collection.find({"time":{$lte: isoDate}}).sort({"time":-1}).limit(1)
But I am looking for a way to do this in one query so i dont have to subtract the results to find the closest one.
I solved a similar problem using an aggregation.
Sample data:
{
"_id" : ObjectId("5e365a1655c3f0bea76632a0"),
"time" : ISODate("2020-02-01T00:00:00Z"),
"description" : "record 1"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a1"),
"time" : ISODate("2020-02-01T00:05:00Z"),
"description" : "record 2"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a2"),
"time" : ISODate("2020-02-01T00:10:00Z"),
"description" : "record 3"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a3"),
"time" : ISODate("2020-02-01T00:15:00Z"),
"description" : "record 4"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a4"),
"time" : ISODate("2020-02-01T00:20:00Z"),
"description" : "record 5"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a5"),
"time" : ISODate("2020-02-01T00:25:00Z"),
"description" : "record 6"
}
And I'm looking for the record nearest to ISODate('2020-02-01T00:18:00.000Z').
db.test_collection.aggregate([
{
$match:
{
time:
{
$gte: ISODate('2020-02-01T00:13:00.000Z'),
$lte: ISODate('2020-02-01T00:23:00.000Z')
}
}
},
{
$project:
{
time: 1,
description: 1,
time_dist: {$abs: [{$subtract: ["$time", ISODate('2020-02-01T00:18:00.000Z')]}]}}
},
{
$sort: {time_dist: 1}
},
{
$limit: 1
}])
The $match stage sets up a "time window". I used 5 minutes for this example.
The $project stage adds a time distance field. This is the time in milliseconds each record is from the query time of ISODate('2020-02-01T00:18:00.000Z').
Then I sorted on the time_dist field and limit the results to 1 to return the record with time closest to ISODate('2020-02-01T00:18:00.000Z').
The result of the aggregation:
{
"_id" : ObjectId("5e365a1655c3f0bea76632a4"),
"time" : ISODate("2020-02-01T00:20:00Z"),
"description" : "record 5",
"time_dist" : NumberLong(120000)
}
check this one
db.collection.find({"time":{$gte: isoDate,$lt: isoDate}}).sort({"time":1}).limit(1)
Please use the same format what mongodb support like following
ISODate("2015-10-26T00:00:00.000Z")
In Pymongo, I used the following function. The idea is to take a datetime object, subtract some days from it and add some days to it, then find a date between those two dates. If there are no such records, increase the date span:
import datetime, dateutil
def date_query(table, date, variance=1):
'''Run a date query using closest available date'''
try:
date_a = date - dateutil.relativedelta.relativedelta(days=variance)
date_b = date + dateutil.relativedelta.relativedelta(days=variance)
result = db[table].find({'date': {'$gte': date_a, '$lt': date_b}}).sort([('date', 1)])
result = list(result)
assert len(result) >= 1
return result[len(result)//2] # return the result closest to the center
except:
return date_query(table, date, variance=variance*2)
accourding to https://stackoverflow.com/a/33351918/4885936
don't need ISODate
simple easy solution is:
if you want 1 hour left to due date just simply :
const tasks = await task.find({
time: {
$gt: Date.now(),
$lt: Date.now() + 3600000 // one hour to miliseconds
}
})
this code get tasks from now to upcoming one hour later.

mongodb $dayOfYear equivalent Unix epoch time aggregation

Is there a method of grouping a Unix epoch time by day, equiv to $dayOfYear
or a process of aggregating floats, ints (into quartiles, hundreds, thousands, %)
try to avoid map reduce but an example of it would be awesome.
You can almost but not quite use Unix time seconds in aggregation pipeline by utilizing the $mod and $divide operators.
The math is Unix time seconds / 86400 to convert seconds into days since Epoch. Then modula that result by 365.25 for the day of the year (leaps every 4).
So the full aggregation for $dayOfYear using seconds is almost as simple as
db.MyCollection.aggregate( {$project : {"day" : {$mod : [ {$divide : ["$unix_seconds", 86400] } , 365.25] } } }, { $group : { _id : "$day" , num : { $sum : 1 } } } , {$sort : {_id : 1}} )
The above adds sorting for sequential day of year.
The problem is that the $mod operator returns both the whole number and remainder. and there is no way of rounding or truncating the remainder. Therefore the results are grouped by whole and remainder.
{
"_id" : 235.1864887063916,
"num" : 1
},
{
"_id" : 235.24300889818738,
"num" : 1
},
{
"_id" : 235.60299520864623,
"num" : 3
},
{
"_id" : 235.66453935674085,
"num" : 1
},
{
"_id" : 235.79900382758004,
"num" : 1
},
{
"_id" : 235.80265845312474,
"num" : 1
},
.. when clearly we want only the whole number
{
"_id" : 235,
"num" : 8
},
What would be nice is a $trunc or modula returning only the whole ($modw), and mod returning only remainder ($modr) operators in mongo.
JavaScript has the Date object which would be available to any server side JavaScript processing for MapReduce functions.
You seem to be aware of the $dayOfYear operator in the aggregation pipeline. There are other operators there for processing dates.
Unless your needs are very specific you should be using the aggregation pipeline. It is very flexible and in most cases will be considerably faster than the equivalent actions run under mapReduce.

Need some help completing this aggregation pipeline

I have an analytics collection where I store queries as individual documents. I want to count the number of queries taking place over the past day (24 hours). Here's the aggregation command as it is:
db.analytics.aggregate([{$group:{_id:{"day":{$dayOfMonth:"$datetime"},"hour":{$hour:"$datetime"}},"count":{$sum:1}}},{$sort:{"_id.day":1,"_id.hour":1}}])
The result looks like:
.
.
.
{
"_id" : {
"day" : 17,
"hour" : 19
},
"count" : 8
},
{
"_id" : {
"day" : 17,
"hour" : 22
},
"count" : 1
},
{
"_id" : {
"day" : 18,
"hour" : 0
},
"count" : 1
}
.
.
.
Originally, my plan was to add a $limit operation to simply take the last 24 results. That's a great plan until you realize that there are some hours without any queries at all. So the last 24 documents could go back more than a single day. I thought of using $match, but I'm just not sure how to go about constructing it. Any ideas?
First of all you need to get the day just as current date or as most recent document from the collection. Then use query for specified day like:
db.analytics.aggregate([
{$project:{datetime:"$datetime",day:{$dayOfMonth:"$datetime"}}},
{$match:{day:3}},
{$group:{_id:{"hour":{$hour:"$datetime"}},"count":{$sum:1}}},
{$sort:{"_id.hour":1}}
]);
where 3 is the day of the month here {$match:{day:3}}
The idea is to add a day field, so, we able to filter by it, then group documents of the day by hours and sort.