Mongo range query slow on objects inside array

Mongo range query slow on objects inside array - mongodb

I have a collection
orders
{
"_id": "abcd",
"last_modified": ISODate("2016-01-01T00:00:00Z"),
"suborders": [
{
"suborder_id": "1",
"last_modified: ISODate("2016-01-02T00: 00: 00Z")
}, {
"suborder_id":"2",
"last_modified: ISODate("2016-01-03T00:00:00Z")
}
]
}
I have two indexes on this collection:
{"last_modified":1}
{"suborders.last_modified": 1}
when I use range queries on last_modified, indexes are properly used, and results are returned instantly. eg query: db.orders.find({"last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});
However, when I am querying on suborders.last_modified, the query takes too long to execute. eq query:db.orders.find({"suborders.last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});
Please help debug this.

The short answer is to use min and max to set the index bounds correctly. For how to approach debugging, read on.
A good place to start for query performance issues is to attach .explain() at the end of your queries. I made a script to generate documents like yours and execute the queries you provided.
I used mongo 3.2.9 and both queries do use the created indices with this setup. However, the second query was returning many more documents (approximately 6% of all the documents in the collection). I suspect that is not your intention.
To see what is happening lets consider a small example in the mongo shell:
> db.arrayFun.insert({
orders: [
{ last_modified: ISODate("2015-01-01T00:00:00Z") },
{ last_modified: ISODate("2016-01-01T00:00:00Z") }
]
})
WriteResult({ "nInserted" : 1 })
then query between May and July of 2015:
> db.arrayFun.find({"orders.last_modified": {
$gt: ISODate("2015-05-01T00:00:00Z"),
$lt: ISODate("2015-07-01T00:00:00Z")
}}, {_id: 0})
{ "orders" : [ { "last_modified" : ISODate("2015-01-01T00:00:00Z") }, { "last_modified" : ISODate("2016-01-01T00:00:00Z") } ] }
Although neither object in the array has last_modified between May and July, it found the document. This is because it is looking for one object in the array with last_modified greater than May and one object with last_modified less than July. These queries cannot intersect multikey index bounds, which happens in your case. You can see this in the indexBounds field of explain("allPlansExecution") output, in particular one of the lower bound or upper bound Date will not be what you specified. This means that a large number of documents may need to be scanned to complete the query depending on your data.
To find objects in the array that have last_modified between two bounds, I tried using $elemMatch.
db.orders.find({"suborders": {
$elemMatch:{
last_modified:{
"$gt":ISODate("2016-09-15T00:00:00Z"),
"$lt":ISODate("2016-09-16T00:00:00Z")
}
}
}})
In my test this returned about 0.5% of all documents. However, it was still running slow. The explain output showed it was still not setting the index bounds correctly (only using one bound).
What ended up working best was to manually set the index bounds with min and max.
db.subDocs.find()
.min({"suborders.last_modified":ISODate("2016-09-15T00:00:00Z")})
.max({"suborders.last_modified":ISODate("2016-09-16T00:00:00Z")})
Which returned the same documents as $elemMatch but used both bounds on the index. It ran in 0.021s versus 2-4s for elemMatch and the original find.

Related

Slow mongodb query on changing sort direction

I have MongoDB 4.4 cluster and a database with collection of 200k documents and 55 indexes for different queries.
The following query:
db.getCollection('tasks').find({
"customer": "gZuu5ZptDEtC6dq2Z",
"finished": true,
"$or": [
{
"scoreCalculated": {
"$exists": true
},
},
{
"workflowProcessed": {
"$exists": true
},
}
]
}).sort({
"scoreCalculated": -1,
"workflowProcessed": -1,
"createdAt": -1
})
is executed at average of less than 1 second. Explain.
But if I change sort direction to
.sort({
"scoreCalculated": 1,
"workflowProcessed": 1,
"createdAt": 1
})
the execution time grows to several seconds (up to 10). Explain.
The first explain shows that apiGetTasks index is used. But it has ascending sort and I don't get why it is not used when I turn sort direction to ascending. Adding same index with descending sort doesn't change anything.
Could you please help me to understand why the second query is so slow?

Please share the indexes.
55 indexes are way too much you should decrease it because it can harm your performance (every time you want to use an index you need to load it to RAM instead letting Mongo utilize the RAM in order to optimize queries).
Moreover, the number of total exterminated docs is 210780 which is all your collection. So you need to rethink how to build efficiently indexes that help you optimize queries.
Read Mongo indexes docs here

MongoDB Query taking 2 minutes for scanning 20k records

Helo,
I have a mongodb collection with around 20k documents. The deleted documents are stored in the collection as well, meaning they are "soft" deleted.
Now, i want to query the documents based on "status" key. The status can be either "open", "closed" or "deleted". I need the records with status as "closed"
I see that the document count fulfilling my criteria is 25 only. However, the documents scanned (after applying index) are 18k.
Hence, my query takes around 2 minutes to execute and many a time, it times out.
My first questions is:
1. Should a query executing on 20k documents take so much time? 20k isn't such a huge count right?
2. Can someone guide me with optimizing the query further, if it can be?
Pushing the deleted records in a separate archive collection is the last thing i would like to do.
Here is my current query:
**
db.collectionname.find({$and: [{ $and: [{ status: {$ne: 'open'} },{ status: {$ne: 'deleted'} }] },
{ 'submittedDate': { $gte: new Date("2019-02-01T00:00:00.000Z"), $lte: new Date("2019-02-02T00:00:00.000Z") } }
] })
**

You must have a single field index as {status: 1}. Replace the index with a compound index {status: 1, submittedDate: 1} will improve your performance. 20k is nothing if properly indexed. If you have only 3 status, replace your query as follows.
db.collectionname.find({status: 'closed',
'submittedDate': { $gte: new Date("2019-02-01T00:00:00.000Z"),
$lte: new Date("2019-02-02T00:00:00.000Z") }})

Created indexes on a mongodb collection, still fails while sorting a large data set

My Query below:
db.chats.find({ bid: 'someID' }).sort({start_time: 1}).limit(10).skip(82560).pretty()
I have indexes on chats collection on the fields in this order
{
"cid" : 1,
"bid" : 1,
"start_time" : 1
}
I am trying to perform sort, but when I write a query and check the result of explain(), I still get the winningPlan as
{
"stage":"SKIP",
"skipAmount":82560,
"inputStage":{
"stage":"SORT",
"sortPattern":{
"start_time":1
},
"limitAmount":82570,
"inputStage":{
"stage":"SORT_KEY_GENERATOR",
"inputStage":{
"stage":"COLLSCAN",
"filter":{
"ID":{
"$eq":"someID"
}
},
"direction":"forward"
}
}
}
}
I was expecting not to have a sort stage in the winning plan as I have indexes created for that collection.
Having no indexes will result into the following error
MongoError: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM [duplicate]
However I managed to make the sort work by increasing the size allocation on ram from 32mb to 64mb, looking for help in adding indexes properly

The order of fields in an index matters. To sort query results by a field that is not at the start of the index key pattern the query must include equality conditions on all the prefix keys that precede the sort keys. The cid field is not in the query nor used for sorting, so you must leave it out. Then you put the bid field first in the index definition as you use it in the equality condition. The start_time goes after that to be used in sorting. Finally, the index must look like this:
{"bid" : 1, "start_time" : 1}
See the documentation for further reference.

MongoDB Compound Index

I need compound index for my collection, but I'm not sure about keys order
My item:
{
_id,
location: {
type: "Point",
coordinates: [<lng>, <lat>]
},
isActive: true,
till: ISODate("2016-12-29T22:00:00.000Z"),
createdAt : ISODate("2016-10-31T12:02:51.072Z"),
...
}
My main query is:
db.collection.find({
$and: [
{
isActive: true
}, {
'till': {
$gte: new Date()
}
},
{
'location': { $geoWithin: { $box: [ [ SWLng,SWLat], [ NELng, NELat] ] } }
}
]
}).sort({'createdAt': -1 })
In human, I need all active items on visible part of my map that are not expired, newly added - first.
Is it normal to create this index:
db.collection.createIndex( { "isActive": 1, "till": -1, "location": "2dsphere", "createdAt": -1 } )
And what is the best order for performance, for disk usage? Or maybe I have to create several indexes...
Thank you!

The order of fields in an index should be:
fields on which you will query for exact values.
fields on which you will sort.
fields on which you will query for a range of values.
In your case it would be:
db.collection.createIndex( { "isActive": 1, "createdAt": -1, "till": -1, "location": "2dsphere" } )
However, indexes on boolean fields are often not very useful, as on average MongoDB will still need to access half of your documents. So I would advise you to do the following:
duplicate the collection for test purposes
create index, you would like to test (i.e. {"isActive": 1, "createdAt": -1, "till": -1, "location": "2dsphere" })
in the mongo shell create variable:
var exp = db.testCollection.explain('executionStats')
execute the following query exp.find({'you query'}) it will return statistics describing the execution of the winning plan
analyze the keys like: "nReturned", "totalKeysExamined","totalDocsExamined"
drop the index, create new one (i.e. {"createdAt": -1, "till": -1, "location": "2dsphere"}), execute exp.find({'you query'}) compare the result to the previous one

In Mongo, many things depend upon data and its access patterns. There are few things to consider while creating index on your collection-
How the data will be accessed from application. (You already know the main query so this part is almost done)
The data size and cardinality and data span.
Operations on the data. (how often reads and writes will happen and in what pattern)
A particular query can use only one index at a time.
Index usage is not static. Mongo keeps changing index used by heuristics and it tries to do it in optimized way. So if you see index1 being used at soem time, it may happen that mongo uses index2 after some time when some/enough different type/cardinality of data is entered.
Indices can be good and worse as well for your application performance. It is best to test them via shell/compass before using them in production.
var ex = db.<collection>.explain("executionStats")
Above line when entered in mongo shell gives you the cursor on explainable object which can be used further to check performance issues.
ex.find(<Your query>).sort(<sort predicate>)
Points to note in above output are
"executionTimeMillis"
"totalKeysExamined"
"totalDocsExamined"
"stage"
"nReturned"
We strive for minimum for first three items (executionTimeMillis, totalKeysExamined and totalDocsExamined) and "stage" is one important thing to tell what is happening. If Stage is "COLLSCAN" then it means it is looking for every document to fulfil the query, if Stage is "SORT" then it means it is doing in-memory sorting. Both are not good.
Coming to your query, there are few things to consider-
If "till" is going to have a fix value like End of month date for all the items entered during a month then it's not a good idea to have index on it. DB will have to scan many documents even after this index. Moreover there will be only 12 entries for this index for a year given it is month end date.
If "till" is a fix value after "createdAt" then it is not good to have index on both.
Indexing "isActive" is not good because there are only two values it can take.
So please try with the actual data and execute below indices and determine which index should fit considering time, no. of docs. examined etc.
1. {"location": "2dsphere" , "createdAt": -1}
2. {"till":1, "location": "2dsphere" , "createdAt": -1}
Apply both indices on collection and execute ex.find().sort() where ex is explainable cursor. Then you need to analyze both outputs and compare to decide best.

Can I do a second 'query' on a MongoDB cursor?

Imagine a collection with about 5,000,000 documents. I need to do a basicCursor query to select ~100 documents based on too many fields to index. Let's call this the basicCursorMatch. This will be immensely slow.
I can however to a bTreeCursor query on a few indexes that will limit my search to ~500 documents. Let's call this query the bTreeCursorMatch.
Is there a way I can do this basicCursorMatch directly on the cursor or collection resulting from the bTreeCursorMatch?
Intuitively I tried
var cursor = collection.find(bTreeCursorMatch);
var results = cursor.find(basicCursorMatch);
similar to collection.find(bTreeCursorMatch).find(basicCursorMatch), which doesn't seem to work.
Alternatively, I was hoping I could do something like this:
collection.aggregate([
{$match: bTreeCursorMatch}, // Uses index 5,000,000 -> 500 fast
{$match: basicCursorMatch}, // No index, 500 -> 100 'slow'
{$sort}
]);
.. but it seems that I cannot do this either. Is there an alternative to do what I want?
The reason I am asking is because this second query will differ a lot and there is no way I can index all the fields. But I do want to make that first query using a bTreeCursor, otherwise querying the whole collection will take forever using a basicCursor.
update
Also, through user input the subselection of 500 documents will be queried in different ways during a session with an unpredictable basicCursor query, using multiple $in $eq $gt $lt. But during this, the bTreeCursor subselection remains the same. Should I just keep doing both queries for every user query, or is there a more efficient way to keep a reference to this collection?

In practice, you rarely need to run second queries on a cursor. You specially don't need to break MongoDB's work into separate indexable / non-indexable chunks.
If you pass a query to MongoDB's find method that can be partially fulfilled by a look-up in an index, MongoDB will do that look-up first, and then do a full scan on the remaining documents.
For instance, I have a collection users with documents like:
{ _id : 4, gender : "M", ... }
There is an index on _id, but not on gender. There are ~200M documents in users.
To get an idea of what MongoDB is doing under the hood, add explain() to your cursor (in the Mongo shell):
> db.users.find( { _id : { $gte : 1, $lt : 10 } } ).explain()
{
"cursor" : "BtreeCursor oldId_1_state_1",
"n" : 9,
"nscannedObjects" : 9
}
I have cut out some of the fields returned by explain. Basically, cursor tells you if it's using an index, n tells you the number of documents returned by the query and nscannedObjects is the number of objects scanned during the query. In this case, mongodb was able to scan exactly the right number of objects.
What happens if we now query on gender as well?
> db.users.find( { _id : { $gte : 1, $lt : 10 }, gender : "F" } ).explain()
{
"cursor" : "BtreeCursor oldId_1_state_1",
"n" : 5,
"nscannedObjects" : 9
}
find returns 5 objects, but had to scan 9 documents. It was therefore able to isolate the correct 9 documents using the _id field. It then went through all 9 documents and filtered them by gender.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse