MongoDB - Querying between a time range of hours - mongodb

I have a MongoDB datastore set up with location data stored like this:
{
"_id" : ObjectId("51d3e161ce87bb000792dc8d"),
"datetime_recorded" : ISODate("2013-07-03T05:35:13Z"),
"loc" : {
"coordinates" : [
0.297716,
18.050614
],
"type" : "Point"
},
"vid" : "11111-22222-33333-44444"
}
I'd like to be able to perform a query similar to the date range example but instead on a time range. i.e. Retrieve all points recorded between 12AM and 4PM (can be done with 1200 and 1600 24 hour time as well).
e.g.
With points:
"datetime_recorded" : ISODate("2013-05-01T12:35:13Z"),
"datetime_recorded" : ISODate("2013-06-20T05:35:13Z"),
"datetime_recorded" : ISODate("2013-01-17T07:35:13Z"),
"datetime_recorded" : ISODate("2013-04-03T15:35:13Z"),
a query
db.points.find({'datetime_recorded': {
$gte: Date(1200 hours),
$lt: Date(1600 hours)}
});
would yield only the first and last point.
Is this possible? Or would I have to do it for every day?

Well, the best way to solve this is to store the minutes separately as well. But you can get around this with the aggregation framework, although that is not going to be very fast:
db.so.aggregate( [
{ $project: {
loc: 1,
vid: 1,
datetime_recorded: 1,
minutes: { $add: [
{ $multiply: [ { $hour: '$datetime_recorded' }, 60 ] },
{ $minute: '$datetime_recorded' }
] }
} },
{ $match: { 'minutes' : { $gte : 12 * 60, $lt : 16 * 60 } } }
] );
In the first step $project, we calculate the minutes from hour * 60 + min which we then match against in the second step: $match.

Adding an answer since I disagree with the other answers in that even though there are great things you can do with the aggregation framework, this really is not an optimal way to perform this type of query.
If your identified application usage pattern is that you rely on querying for "hours" or other times of the day without wanting to look at the "date" part, then you are far better off storing that as a numeric value in the document. Something like "milliseconds from start of day" would be granular enough for as many purposes as a BSON Date, but of course gives better performance without the need to compute for every document.
Set Up
This does require some set-up in that you need to add the new fields to your existing documents and make sure you add these on all new documents within your code. A simple conversion process might be:
MongoDB 4.2 and upwards
This can actually be done in a single request due to aggregation operations being allowed in "update" statements now.
db.collection.updateMany(
{},
[{ "$set": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}}]
)
Older MongoDB
var batch = [];
db.collection.find({ "timeOfDay": { "$exists": false } }).forEach(doc => {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"timeOfDay": doc.datetime_recorded.valueOf() % (60 * 60 * 24 * 1000)
}
}
}
});
// write once only per reasonable batch size
if ( batch.length >= 1000 ) {
db.collection.bulkWrite(batch);
batch = [];
}
})
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
If you can afford to write to a new collection, then looping and rewriting would not be required:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$subtract": [ "$datetime_recorded", Date(0) ] },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
Or with MongoDB 4.0 and upwards:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
All using the same basic conversion of:
1000 milliseconds in a second
60 seconds in a minute
60 minutes in an hour
24 hours a day
The modulo from the numeric milliseconds since epoch which is actually the value internally stored as a BSON date is the simple thing to extract as the current milliseconds in the day.
Query
Querying is then really simple, and as per the question example:
db.collection.find({
"timeOfDay": {
"$gte": 12 * 60 * 60 * 1000, "$lt": 16 * 60 * 60 * 1000
}
})
Of course using the same time scale conversion from hours into milliseconds to match the stored format. But just like before you can make this whatever scale you actually need.
Most importantly, as real document properties which don't rely on computation at run-time, you can place an index on this:
db.collection.createIndex({ "timeOfDay": 1 })
So not only is this negating run-time overhead for calculating, but also with an index you can avoid collection scans as outlined on the linked page on indexing for MongoDB.
For optimal performance you never want to calculate such things as in any real world scale it simply takes an order of magnitude longer to process all documents in the collection just to work out which ones you want than to simply reference an index and only fetch those documents.
The aggregation framework may just be able to help you rewrite the documents here, but it really should not be used as a production system method of returning such data. Store the times separately.

Related

MongoDB sort using a custom function

Let's say I have a collection that looks like:
{
_id: 'aaaaaaaaaaaaaaaaaaaaaaaaa',
score: 10
hours: 50
},
{
_id: 'aaaaaaaaaaaaaaaaaaaaaaaab',
score: 5
hours: 55
},
{
_id: 'aaaaaaaaaaaaaaaaaaaaaaaac',
score: 15
hours: 60
}
I want to sort this list by a custom order, namely
value = (score - 1) / (T + 2) ^ G
score: score
T: current_hours - hours
G: some constant
How do I do this? I assume this is going to require writing a custom sorting function that compares the score and hours fields in addition to taking a current_hours as an input, performs that comparison and returns the sorted list. Note that hours and current_hours is simply the number of hours that have elapsed since some arbitrary starting point. So if I'm running this query 80 hours after the application started, current_hours takes the value of 80.
Creating an additional field value and keeping it constantly updated is probably too expensive for millions of documents.
I know that if this is possible, this is going to look something like
db.items.aggregate([
{ "$project" : {
"_id" : 1,
"score" : 1,
"hours" : 1,
"value" : { SOMETHING HERE, ALSO REQUIRES PASSING current_hours }
}
},
{ "$sort" : { "value" : 1 } }
])
but I don't know what goes into value
I think value will look something like this:
"value": {
$let: {
vars: {
score: "$score",
t: {
"$subtract": [
80,
"$hours"
]
},
g: 3
},
in: {
"$divide": [
{
"$subtract": [
"$$score",
1
]
},
{
"$pow": [
{
"$add": [
"$$t",
2
]
},
"$$g"
]
}
]
}
}
}
Playground example here
Although it's verbose, it should be reasonably straightforward to follow. It uses the arithmetic expression operators to build the calculation that you are requesting. A few specific notes:
We use $let here to set some vars for usage. This includes the "runtime" value for current_hours (80 in the example per the description) and 3 as an example for G. We also "reuse" score here which is not strictly necessary, but done for consistency of the next point.
$ refers to fields in the document where $$ refer to variables. That's why everything in the vars definition uses $ and everything for the actual calculation in in uses $$. The reference to score inside of in could have been done via just the field name ($), but I personally prefer the consistency of this approach.

How to retrieve documents with createdAt more than 48 hours in mongoose

I need to retrieve documents where the createdAt timestamp is more than 48 hours in mongoose.
Here's my sample code below but it doesn't retrieve any documents even though there're documents that match the condition.
Model.find({
createdAt: { $lt: new Date(Date.now() - 2 * 24 * 60 * 60 * 1000) },
});
NB: The createdAt field is the default in mongoose when timestamp is enabled { timestamps: true }
I would really appreciate it if anyone can help out, thanks in advance.
Try
var days= 2;
var date = new Date(date.setDate(date.getDate() - days));
Model.find({createdAt : {$lt : date}}).count());
With MongoDB aggegation framework, we have access $$NOW (Standalone) | $$CLUSTER_TIME (Cluster) variable which returns current date.
If we subtract 172800000 miliseconds (48 hours) from current date and use $expr operator, we can get desired result.
Try this one:
Model.aggregate([
{
$match: {
$expr: {
$gte: [
"$createdAt",
{
$toDate: {
$subtract: [
{
$toLong: "$$CLUSTER_TIME"
},
172800000 // 2 x 24 x 60 x 60 x 1000
]
}
}
]
}
}
}
]).exec();
MongoPlayground
Thanks to everyone that tried to help out.
My solution above works, I wasn't getting any records at that time cos unknowing to me my env file was pointing to another MongoDB server.

MongoDB query is slow and not using Indexes

I have a collection with structure like this:
{
"_id" : ObjectId("59d7cd63dc2c91e740afcdb"),
"enrollment" : [
{ "month":-10, "enroll":'00'},
{ "month":-9, "enroll":'00'},
{ "month":-8, "enroll":'01'},
//other months
{ "month":8, "enroll":'11'},
{ "month":9, "enroll":'11'},
{ "month":10, "enroll":'00'}
]
}
I am trying to run the following query:
db.getCollection('collection').find({
"enrollment": {
"$not": {
"$elemMatch": { "month": { "$gte": -2, "$lte": 9 }, "enroll": "00" }
}
}
}).count()
This query is taking 1.6 to 1.9 seconds. I need to get this down as low as possible, to milli seconds if that is possible.
I tried creating multi key index on month and enrolled fields. I tried various combinations but the query is not using any indexes.
I tried all these combinations:
1. { 'enrollment.month':1 }
2. { 'enrollment.month':1 }, { 'enrollment.enroll':1 } -- two seperate indexes
3. { 'enrollment.month':1, 'enrollment.enroll':1}
4. { 'enrollment.enroll':1, 'enrollment.month':1}
Parsed Query:
Query Plan:
Any suggestions to improve the performance are highly appreciated.
I am fairly confident that the hardware is not an issues but open for any suggestions.
My data size is not huge. Its just under 1GB. Total number of documents are 41K and sub document count is approx. 13 million
Note: I have posted couple of questions on this in last few days, but with this i am trying to narrow down the area. Please do not take this as a duplicate of my earlier questions.
Try to inverse the query:
db.getCollection('collection').find({
"enrollment": {
"$elemMatch": {
"month": { "$lt": -2, "$gt": 9 },
"enroll": {$ne: "00"}
}
}
}).count()

MongoDB query: how to select the longest period of time of a matched value

I have a mongo database with many records in the format of:
{
id: {
$_id
},
date: {
$date: YYYY-MM-DDThh:mm:ssZ
},
reading: X.XX
}
where the date is a timestamp in mongo and reading is a float (id is just the unique identifier for the data point) .
I would like to be able to count the longest period of time when the reading was a certain value (lets say 0.00 for ease of use) and return the start and end points of this time period. If there were more than one time period of the same length I would like them all returned.
Ultimately, for example, I would like to be able to say
"The longest time period the reading is 0.00 and 1.25 hours
between
2000-01-01T00:00:00 - 2000-01-01T01:15:00,
2000-06-01T02:00:00 - 2000-06-01T03:15:00,
2000-11-11T20:00:00 - 2000-11-11T21:15:00 ."
For my mongo aggregation query I am thinking of doing this:
get the timeframe I am interested in (eg 2000-01-01 to
2001-01-01)
sort the data by date descending
somehow select the longest run when the reading is 0.00.
This is the query I have so far:
[
{
$match: {
date: { $gte: ISODate("2000-01-01T00:00:00.0Z"), $lt: ISODate("2001-01-01T00:00:00.0Z") }
}
},
{ "$sort": { "date": -1 } },
{
"$group" : {
"_id": null,
"Maximum": { "$max": { "max": "$reading", "date": "$date" } },
"Longest": { XXX: { start_dates: [], end_dates: [] } }
}
},
{
"$project": {
"_id": 0,
"max": "$Maximum",
"longest": "$Longest"
}
}
]
I do not know how to select the longest run. How would you do this?
(You will notice I am also interested in the maximum reading within the time period and the dates on which that maximum reading happens. At the moment I am only recording the latest date/time this occurs but would like it to record all the dates/times the maximum value occurs on eventually.)

mongodb: query for the time period between two date fields

If I have documents in the following schema saved in my mongoDB:
{
createdDate: Date,
lastUpdate: Date
}
is it possible to query for documents where the period of time between creation and the last update is e.g. greater than one day?
Best option is to use the $redact aggregation pipeline stage:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$gt": [
{ "$subtract": [ "$lastUpdate", "$createdDate" ] },
1000 * 60 * 60 * 24
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
So you are looking at the milliseconds value from the difference being greater than the milliseconds value for one day. The $subtract does the math for the difference, and when two dates are subtracted the difference in miliseconds is returned.
The $redact operator takes a logical expression as "if", and where that condition is true it takes the action in "then" which is to $$KEEP the document. Where it is false then the document is removed from results with $$PRUNE.
Note that since this is a logical condition and not a set value or a range of values, then an "index" is not used.
Since the operations in the aggregation pipeline are natively coded, this is the fastest execution of such a statement that you can get though.
The alternate is JavaScript evaluation with $where. This takes a JavaScript function expression that needs to similarly return a true or false value. In the shell you can shorthand like this:
db.collection.find(function() {
return ( this.lastUpdate.valueOf() - this.createdDate.valueOf() )
> ( 1000 * 60 * 60 * 24 );
})
Same thing, except that JavaScript evalution requires interpretation and will run much slower than the .aggregate() equivalent. By the same token, this type of expression cannot use an index to optimize performance.
For the best results, store the difference in the document. Then you can simply query directly on that property, and of course you can index it as well.
You can use $expr ( 3.6 mongo version operator ) to use aggregation functions in regular query.
Compare query operators vs aggregation comparison operators.
db.col.find({$expr:{$gt:[{"$subtract":["$lastUpdate","$createdDate"]},1000*60*60*24]}})
Starting in Mongo 5, it's a perfect use case for the new $dateDiff aggregation operator:
// { created: ISODate("2021-12-05T13:20"), lastUpdate: ISODate("2021-12-06T05:00") }
// { created: ISODate("2021-12-04T09:20"), lastUpdate: ISODate("2021-12-05T18:00") }
db.collection.aggregate([
{ $match: {
$expr: {
$gt: [
{ $dateDiff: { startDate: "$created", endDate: "$lastUpdate", unit: "hour" } },
24
]
}
}}
])
// { created: ISODate("2021-12-04T09:20"), lastUpdate: ISODate("2021-12-05T18:00") }
This computes the number of hours of difference between the created and lastUpdate dates and checks if it's more than 24 hours.