How to write a query to find the mongoDB documents whose time difference between two Date fields is larger than a certain value? - mongodb

I have a mongoDB that contains documents like this:
The data types of start_local_datetime and last_update_local_datetime are both Date.
How can I find the documents whose difference between last_update_local_datetime and start_local_datetime is larger than 10 days?
I mean I want to query data like this:
start_local_datetime: 2019-08-23T10:17:42.000+00:00
terminate_local_datetime: 2019-09-19T10:17:42.000+00:00
Documents like this aren't something that I want.
start_local_datetime: 2019-08-23T10:17:42.000+00:00
terminate_local_datetime: 2019-08-25T10:17:42.000+00:00
Because terminate_local_datetime - start_local_datetime is smaller than 10 days.

You can write an aggregation pipeline, using $dateDiff operator, like this:
db.collection.aggregate([
{
"$match": {
$expr: {
"$gt": [
{
"$dateDiff": {
"startDate": "$start_local_datetime",
"unit": "day",
"endDate": "$terminate_local_datetime"
}
},
10
]
}
}
}
])
See it working here. However, this will only work in Mongo 5.0 or above, as the operator was added in that version. For other versions, this will work
db.collection.aggregate([
{
"$addFields": {
"timeDifference": {
"$divide": [
{
"$subtract": [ <-- Returns difference between two dates in milliseconds
"$terminate_local_datetime",
"$start_local_datetime"
]
},
86400000
]
}
}
},
{
"$match": {
$expr: {
"$gt": [
"$timeDifference",
10
]
}
}
},
{
"$project": {
timeDifference: 0
}
}
])
Here, we calculate the time difference manually and then compare it, with 10.
This is the playground link.

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

MongoDB conditional query depending on possible dates

I have a scenario where I want to pull documents that have a lastAlertSentDate field that's over 30 days old. This will run in a daily cron job. Upon querying, this field will then be reset to NOW. So it's meant to act as a "rotating 30 day window" if you will.
The complication here is that the field won't exist if it hasn't been set yet. In this edge case, we'll then have to use a createdDate field of the document to do the 30-day comparison against.
So effectively, I want something like, "If lastAlertSentDate exists, then get all docs where it's older than 30days from now. ---Otherwise, get all docs where createdDate is older than 30days from now"
So the logic between both fields are the same, it's just the field itself that can be different. Because of this, I was thinking to first USE addFields a dateToUseField and then do a match on the second stage based on this.
[
{
'$addFields': {
'dateToUse': {
'$cond': {
'if': {
'$ne': [
'$lastAlertSentDate', undefined
]
},
'then': '$lastAlertSentDate',
'else': '$createdDate'
}
}
}
}, {
'$match': {
'dateToUse': {
'$lte': '30_DAYS_PRIOR'
}
}
}
]
So the else part doesn't seem to work. It doesn't assign $createdDate to dateToUse.
What am I missing? Also, how can I condense this? I'm sure I don't need the addFields first and I can do everything within the $match
You have two options here:
Use a $or query with two predicates, where each of them is a $and predicate:
Either lastAlertSentDate does not exists and createdDate > n
Or lastAlertSentDate exists and it is > n
Playground Link
db.collection.find({
$or: [
{
$and: [
{
"lastAlertSentDate": {
"$exists": false
}
},
{
"createdDate": {
$gt: 5
}
}
]
},
{
$and: [
{
"lastAlertSentDate": {
"$exists": true
}
},
{
"lastAlertSentDate": {
$gt: 5
}
}
]
}
]
})
Use an aggregation using the $ifNull
Playground Link
db.collection.aggregate([
{
$match: {
$expr: {
$gt: [
{
"$ifNull": [
"$lastAlertSentDate",
"$createdDate"
]
},
5
]
}
}
}
])

Mongo DB Collection Scan OR Index Scan

I have an index on "timeofcollection". There is an issue that one query using same field shown collection is being scanned while on shown index scan. These are the "$match" steps in an aggregation pipeline I am posting below. Can someone help me out explaining what is an issue and how should I handle it?
If I have following in $match step in pipeline, it evaluates as an index scan
{
"timeofcollection":{$gte:ISODate("2020-09-24T00:00:00.000+0000"),$lt:ISODate('2020-09-25T00:00:00.000+0000')}
}
If I have following step in pipeline, it evaluates as collection scan
{
$match: {
"$expr": {
"$and": [{
"$gte": [
"$_id.dt",
{
"$subtract": [{
"$toDate": {
"$dateToString": {
"date": "$$NOW",
"format": "%Y-%m-%dT00:00:00.000+0000"
}
}
},
86400000
]
}
],
},
{
"$lt": [
"$_id.dt",
{
"$toDate": {
"$dateToString": {
"date": "$$NOW",
"format": "%Y-%m-%dT00:00:00.000+0000"
}
}
}
]
}
]
}
}
}
Basically what I am trying to achieve is to pull records falling in last day. This works fine but involves collection scan which I can not do.
Any help?
The query planner will only use an index for equality comparison when using the $expr operator.
It will also only use the index when the values of the expressions are constant for the query. Since the $$NOW variable is not bound until query execution begins, and will have a different value for every execution, the query planner will not use an index for a query using that variable.
This may not be a complete answer, but one obvious problem I see with your above aggregation is that, for some reason, you seem to be converting dates to text, only to convert them back to dates again. Typically, if your filter were to contain a function of timeofcollection, then the index on timeofcollection might not be usable. Try this version:
$match: {
"$expr": {
"$and": [
{
"$gte": [
"$_id.dt",
{
"$subtract": [ "$$NOW", 86400000 ]
}
],
},
{
"$lt": [
"$_id.dt", "$$NOW",
]
}
]
}
}
Note that I am assuming here that dt in the above fragment is an alias for timeofcollection, defined somewhere earlier.
The key point here is that using timeofcollection inside a function might render your index unusable. The above version may get around this problem.

Is it possible to list in mongodb the list of elements whose value is less than 10% of another field?

I basically have a database where I record motorcycles and their mileage.
{
"motorcycle":"A",
"current_km":4600,
"review_km":5000
},
{
"motorcycle":"B",
"current_km":4000,
"review_km":5000
},
{
"motorcycle":"C",
"current_km":4900,
"review_km":5000
},
{
"motorcycle":"D",
"current_km":3000,
"review_km":5000
}
I have a field called current_km that determines your current mileage and I have another field called review_km, which consists of specifying the mileage in which your review should be done, as long as your current mileage (current_km) is greater than 10% of Mileage review (review_km).
So I would like to list the elements where:
current_km is greater than:
(review_km - ( review_km * 0.10))
for example:
current_km = 4600;
review_km = 5000;
result = 5000 - (5000 * 0.10);
4600 (current_km)> = 4500 (result) // in this case it is showed
In my database it would show the results of motorcycles A and C
how can I do it? I don't know if it is possible to do it in mongodb directly.
Need to use aggregation with $subtract and $multiply,
$addFields add new fields, we are generating result field, equation (review_km - ( review_km * 0.10)) using $subtract and $multiply
$match equation in $expr if current_km >= result if its correct then returns document
db.collection.aggregate([
{
$addFields: {
result: {
$subtract: [
"$review_km",
{
$multiply: [
"$review_km",
0.10
]
}
]
}
}
},
{
$match: {
$expr: {
$gte: [
"$current_km",
"$result"
]
}
}
}
])
Working Playground: https://mongoplayground.net/p/s2qenvuzLKF
Shorter version
If you don't want result field in response then combined condition in $match and $addFields is no longer needed
db.collection.aggregate([
{
$match: {
$expr: {
$gte: [
"$current_km",
{
$subtract: [
"$review_km",
{
$multiply: [
"$review_km",
0.10
]
}
]
}
]
}
}
}
])
Working Playground: https://mongoplayground.net/p/fii__3tTika

Mongo $subtract date doesn't work in aggregation $match block

I am creating a mongo aggregation query which use a $subtract operator in my $match block. As explained in these codes below.
This query doesn't work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: {
$subtract: [new Date(), 24 * 60 * 60 * 1000]
}
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
However, this second query work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: new Date(new Date() - 24 * 60 * 60 * 1000)
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
I need to use $subtract on my $match block so I can't use the last query.
As of mongodb 3.6 you can use $subtract in the $match stage via the $expr. Here's the docs: https://docs.mongodb.com/manual/reference/operator/query/expr/
I was able to get a query like what you're describing via this $expr and a new system variable in mongodb 4.2 called $$NOW. Here is my query, which gives me orders that have been created within the last 4 hours:
[
{ $match:
{ $expr:
{ $gt: [
"$_created_at",
{ $subtract: [ "$$NOW", 4 * 60 * 60 * 1000] } ]
}
}
}
]
Well you cannot do that and you are not meant to do so either. Another valid thing is that you say to "need" to do this but in reality you really do not.
Pretty much all of the general aggregation operators outside of the pipeline operators are really only valid within a $project or a $group pipeline stage. Mostly within $project but certainly not in others.
A $match pipeline is really the same as a general "query" operation, so the only things valid in there are the query operators.
As for the case for your "need", any "value" that is submitted within an aggregation pipeline and particularly within a $match needs to be evaluated outside of the actual pipeline before the BSON representation is sent to the server.
The only exception is the notation that defines variables in the document, particularly "fieldnames" such a "$fieldname" and then only really in $project or $group. So that means something that "refers" to an existing value of a document, and that is something that cannot be done within any type of "query" document expression.
If you need to work with the value of another field in the document then you work it out with $project first, as in:
db.collection.aggregate([
{ "$project": {
"fieldMath": { "$subtract": [ "$fieldOne", "$fieldTwo" ] }
}},
{ "$match": { "fieldMath": { "$gt": 2 } }}
])
For any other purpose you really want to evaluate the value "outside" the pipeline.
The above answers the question you asked, but this answers the question you didn't ask.
Your pipeline doesn't make any sense since grouping on the "timestamp" alone would be unlikely to group anything since the values are of millisecond accuracy and there is likely not to be more than just a few at best for very active systems.
It appears like you are looking for the math to group by "day", which you can do like this:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]
},
"total": { "$sum": "$total" }
}}
])
That "rounds" your timestamp value to a single day and has a much better chance of "aggregating" something than you would otherwise have.
Or you can use the "date aggregation operators" to do much the same thing with a composite key.
So if you want to "query" then it evaluates externally. If you want to work on a value "within the document" then you must do so in either a $project or $group pipeline stage.
The $subtract operator is a projection-operator. It is only available during a $project step. So your options are:
(not recommended) Add a $project-step before your $match-step to convert the timestamp field of all documents for the following match-step. I would not recommend you to do this because this operation needs to be performed on every single document on your database and prevents the database from using an index on the timestamp field, so it could cost you a lot of performance.
(recommended) Generate the Date you want to match against in the shell / in your application. Generate a new Date() object, store it in a variable, subtract 24 hours from it and perform your 2nd query using that variable.