Mongo DB Collection Scan OR Index Scan - mongodb

I have an index on "timeofcollection". There is an issue that one query using same field shown collection is being scanned while on shown index scan. These are the "$match" steps in an aggregation pipeline I am posting below. Can someone help me out explaining what is an issue and how should I handle it?
If I have following in $match step in pipeline, it evaluates as an index scan
{
"timeofcollection":{$gte:ISODate("2020-09-24T00:00:00.000+0000"),$lt:ISODate('2020-09-25T00:00:00.000+0000')}
}
If I have following step in pipeline, it evaluates as collection scan
{
$match: {
"$expr": {
"$and": [{
"$gte": [
"$_id.dt",
{
"$subtract": [{
"$toDate": {
"$dateToString": {
"date": "$$NOW",
"format": "%Y-%m-%dT00:00:00.000+0000"
}
}
},
86400000
]
}
],
},
{
"$lt": [
"$_id.dt",
{
"$toDate": {
"$dateToString": {
"date": "$$NOW",
"format": "%Y-%m-%dT00:00:00.000+0000"
}
}
}
]
}
]
}
}
}
Basically what I am trying to achieve is to pull records falling in last day. This works fine but involves collection scan which I can not do.
Any help?

The query planner will only use an index for equality comparison when using the $expr operator.
It will also only use the index when the values of the expressions are constant for the query. Since the $$NOW variable is not bound until query execution begins, and will have a different value for every execution, the query planner will not use an index for a query using that variable.

This may not be a complete answer, but one obvious problem I see with your above aggregation is that, for some reason, you seem to be converting dates to text, only to convert them back to dates again. Typically, if your filter were to contain a function of timeofcollection, then the index on timeofcollection might not be usable. Try this version:
$match: {
"$expr": {
"$and": [
{
"$gte": [
"$_id.dt",
{
"$subtract": [ "$$NOW", 86400000 ]
}
],
},
{
"$lt": [
"$_id.dt", "$$NOW",
]
}
]
}
}
Note that I am assuming here that dt in the above fragment is an alias for timeofcollection, defined somewhere earlier.
The key point here is that using timeofcollection inside a function might render your index unusable. The above version may get around this problem.

Related

How to write a query to find the mongoDB documents whose time difference between two Date fields is larger than a certain value?

I have a mongoDB that contains documents like this:
The data types of start_local_datetime and last_update_local_datetime are both Date.
How can I find the documents whose difference between last_update_local_datetime and start_local_datetime is larger than 10 days?
I mean I want to query data like this:
start_local_datetime: 2019-08-23T10:17:42.000+00:00
terminate_local_datetime: 2019-09-19T10:17:42.000+00:00
Documents like this aren't something that I want.
start_local_datetime: 2019-08-23T10:17:42.000+00:00
terminate_local_datetime: 2019-08-25T10:17:42.000+00:00
Because terminate_local_datetime - start_local_datetime is smaller than 10 days.
You can write an aggregation pipeline, using $dateDiff operator, like this:
db.collection.aggregate([
{
"$match": {
$expr: {
"$gt": [
{
"$dateDiff": {
"startDate": "$start_local_datetime",
"unit": "day",
"endDate": "$terminate_local_datetime"
}
},
10
]
}
}
}
])
See it working here. However, this will only work in Mongo 5.0 or above, as the operator was added in that version. For other versions, this will work
db.collection.aggregate([
{
"$addFields": {
"timeDifference": {
"$divide": [
{
"$subtract": [ <-- Returns difference between two dates in milliseconds
"$terminate_local_datetime",
"$start_local_datetime"
]
},
86400000
]
}
}
},
{
"$match": {
$expr: {
"$gt": [
"$timeDifference",
10
]
}
}
},
{
"$project": {
timeDifference: 0
}
}
])
Here, we calculate the time difference manually and then compare it, with 10.
This is the playground link.

Searching date in mongodb

In mongodb collection i have field called event_startdate Which is in following format
event_startdate:2018-10-14 16:00:00.000 (type:date)
i am storing date and time together is this field and want to serach event which is happening today and using query given bellow
var date= moment().format('YYYY-MM-D')
Event.find({ event_startdate:date}).exec()
Since time also attached to event_startdate i am unable to fetch today's detail.
is there a way to find this?
var date= moment().format('YYYY-MM-D') would give you a string which then you are trying to compare to an ISO date in mongo.
Try something among these lines:
db.getCollection('COLNAME').aggregate([
{
"$match": {
"$expr": {
$eq: [
"2018-10-10", // <--- You string date goes here
{
"$dateToString": {
"date": "$date",
"format": "%Y-%m-%d"
}
}
]
}
}
}
])
This is using the $expr pipeline operator with dateToString.
You can also do using find.
db.getCollection('loginDetail').find({
"$expr": {
$eq:
[ "2018-09-21", { "$dateToString": { "date": "$eventDate", "format": "%Y-%m-%d"
}
}
]
}
})

MongoDB: adding fields based on partial match query - expression vs query

So I have one collection that I'd like to query/aggegate. The query is made up of several parts that are OR'ed together. For every part of the query, I have a specific set of fields that need to be shown.
So my hope was to do this with an aggregate, that will $match the queries OR'ed together all at once, and then use $project with $cond to see what fields are needed. The problem here is that $cond uses expressions, while the $match uses queries. Which is a problem since some query features are not available as an expression. So a simple conversion is not an option.
So I need another solution..
- I could just make an aggregate per separate query, because there I know what fields to match, and them merger the results together. But this will not work if I use pagination in the queries (limit/skip etc).
- find some other way to tag every document so I can (afterwards) remove any fields not needed. It might not be super efficient, but would work. No clue yet how to do that
- figure out a way to make queries that are only made of expressions. For my purpose that might be good enough, and it would mean a rewrite of the query parser. It could work, but is not ideal.
So This is the next incarnation right here. It will deduplicate and merge records and finally transform it back again to something resembling a normal query result:
db.getCollection('somecollection').aggregate(
[
{
"$facet": {
"f1": [
{
"$match": {
<some query 1>
},
{
"$project: {<some fixed field projection>}
}
],
"f2": [
{
"$match": {
<some query 1>
}
},
{
"$project: {<some fixed field projection>}
}
]
}
},
{
$project: {
"rt": { $concatArrays: [ "$f1", "$f2"] }
}
},
{ $unwind: { path: "$rt"} },
{ $replaceRoot: {newRoot:"$rt"}},
{ $group: {_id: "$_id", items: {$push: {item:"$$ROOT"} } }},
{
$project: {
"rt": { $mergeObjects: "$items" }
}
},
{ $replaceRoot: {newRoot:"$rt.item"}},
]
);
There might still be some optimisation to be, so any comments are welcome
I found an extra option using $facet. This way, I can make a facet for every group opf fields/subqueries. This seems to work fine, except that the result is a single document with a bunch of arrays. not yet sure how to convert that back to multiple documents.
okay, so now I have it figured out. I'm not sure yet about all of the intricacies of this solution, but it seems to work in general. Here an example:
db.getCollection('somecollection').aggregate(
[
{
"$facet": {
"f1": [
{
"$match": {
<some query 1>
},
{
"$project: {<some fixed field projection>
}
],
"f2": [
{
"$match": {
<some query 1>
}
},
{
"$project: {<some fixed field projection>
}
]
}
},
{
$project: {
"rt": { $concatArrays: [ "$f1", "$f2"] }
}
},
{ $unwind: { path: "$rt"} },
{ $replaceRoot: {newRoot:"$rt"}}
]
);

Mongo $subtract date doesn't work in aggregation $match block

I am creating a mongo aggregation query which use a $subtract operator in my $match block. As explained in these codes below.
This query doesn't work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: {
$subtract: [new Date(), 24 * 60 * 60 * 1000]
}
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
However, this second query work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: new Date(new Date() - 24 * 60 * 60 * 1000)
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
I need to use $subtract on my $match block so I can't use the last query.
As of mongodb 3.6 you can use $subtract in the $match stage via the $expr. Here's the docs: https://docs.mongodb.com/manual/reference/operator/query/expr/
I was able to get a query like what you're describing via this $expr and a new system variable in mongodb 4.2 called $$NOW. Here is my query, which gives me orders that have been created within the last 4 hours:
[
{ $match:
{ $expr:
{ $gt: [
"$_created_at",
{ $subtract: [ "$$NOW", 4 * 60 * 60 * 1000] } ]
}
}
}
]
Well you cannot do that and you are not meant to do so either. Another valid thing is that you say to "need" to do this but in reality you really do not.
Pretty much all of the general aggregation operators outside of the pipeline operators are really only valid within a $project or a $group pipeline stage. Mostly within $project but certainly not in others.
A $match pipeline is really the same as a general "query" operation, so the only things valid in there are the query operators.
As for the case for your "need", any "value" that is submitted within an aggregation pipeline and particularly within a $match needs to be evaluated outside of the actual pipeline before the BSON representation is sent to the server.
The only exception is the notation that defines variables in the document, particularly "fieldnames" such a "$fieldname" and then only really in $project or $group. So that means something that "refers" to an existing value of a document, and that is something that cannot be done within any type of "query" document expression.
If you need to work with the value of another field in the document then you work it out with $project first, as in:
db.collection.aggregate([
{ "$project": {
"fieldMath": { "$subtract": [ "$fieldOne", "$fieldTwo" ] }
}},
{ "$match": { "fieldMath": { "$gt": 2 } }}
])
For any other purpose you really want to evaluate the value "outside" the pipeline.
The above answers the question you asked, but this answers the question you didn't ask.
Your pipeline doesn't make any sense since grouping on the "timestamp" alone would be unlikely to group anything since the values are of millisecond accuracy and there is likely not to be more than just a few at best for very active systems.
It appears like you are looking for the math to group by "day", which you can do like this:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]
},
"total": { "$sum": "$total" }
}}
])
That "rounds" your timestamp value to a single day and has a much better chance of "aggregating" something than you would otherwise have.
Or you can use the "date aggregation operators" to do much the same thing with a composite key.
So if you want to "query" then it evaluates externally. If you want to work on a value "within the document" then you must do so in either a $project or $group pipeline stage.
The $subtract operator is a projection-operator. It is only available during a $project step. So your options are:
(not recommended) Add a $project-step before your $match-step to convert the timestamp field of all documents for the following match-step. I would not recommend you to do this because this operation needs to be performed on every single document on your database and prevents the database from using an index on the timestamp field, so it could cost you a lot of performance.
(recommended) Generate the Date you want to match against in the shell / in your application. Generate a new Date() object, store it in a variable, subtract 24 hours from it and perform your 2nd query using that variable.

Mongodb query specific month|year not date

How can I query a specific month in mongodb, not date range, I need month to make a list of customer birthday for current month.
In SQL will be something like that:
SELECT * FROM customer WHERE MONTH(bday)='09'
Now I need to translate that in mongodb.
Note: My dates are already saved in MongoDate type, I used this thinking that will be easy to work before but now I can't find easily how to do this simple thing.
With MongoDB 3.6 and newer, you can use the $expr operator in your find() query. This allows you to build query expressions that compare fields from the same document in a $match stage.
db.customer.find({ "$expr": { "$eq": [{ "$month": "$bday" }, 9] } })
For other MongoDB versions, consider running an aggregation pipeline that uses the $redact operator as it allows you to incorporate with a single pipeline, a functionality with $project to create a field that represents the month of a date field and $match to filter the documents
which match the given condition of the month being September.
In the above, $redact uses $cond tenary operator as means to provide the conditional expression that will create the system variable which does the redaction. The logical expression in $cond will check
for an equality of a date operator field with a given value, if that matches then $redact will return the documents using the $$KEEP system variable and discards otherwise using $$PRUNE.
Running the following pipeline should give you the desired result:
db.customer.aggregate([
{ "$match": { "bday": { "$exists": true } } },
{
"$redact": {
"$cond": [
{ "$eq": [{ "$month": "$bday" }, 9] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
This is similar to a $project +$match combo but you'd need to then select all the rest of the fields that go into the pipeline:
db.customer.aggregate([
{ "$match": { "bday": { "$exists": true } } },
{
"$project": {
"month": { "$month": "$bday" },
"bday": 1,
"field1": 1,
"field2": 1,
.....
}
},
{ "$match": { "month": 9 } }
])
With another alternative, albeit slow query, using the find() method with $where as:
db.customer.find({ "$where": "this.bday.getMonth() === 8" })
You can do that using aggregate with the $month projection operator:
db.customer.aggregate([
{$project: {name: 1, month: {$month: '$bday'}}},
{$match: {month: 9}}
]);
First, you need to check whether the data type is in ISODate.
IF not you can change the data type as the following example.
db.collectionName.find().forEach(function(each_object_from_collection){each_object_from_collection.your_date_field=new ISODate(each_object_from_collection.your_date_field);db.collectionName.save(each_object_from_collection);})
Now you can find it in two ways
db.collectionName.find({ $expr: {
$eq: [{ $year: "$your_date_field" }, 2017]
}});
Or by aggregation
db.collectionName.aggregate([{$project: {field1_you_need_in_result: 1,field12_you_need_in_result: 1,your_year_variable: {$year: '$your_date_field'}, your_month_variable: {$month: '$your_date_field'}}},{$match: {your_year_variable:2017, your_month_variable: 3}}]);
Yes you can fetch this result within date like this ,
db.collection.find({
$expr: {
$and: [
{
"$eq": [
{
"$month": "$date"
},
3
]
},
{
"$eq": [
{
"$year": "$date"
},
2020
]
}
]
}
})
If you're concerned about efficiency, you may want to store the month data in a separate field within each document.