MongoDB: Slow query - mongodb

I am running a mongo query like this
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true, $ne:[]}}).sort({dateField:-1})
The collection has approx. 10^6 documents. I have indexes on the stringField and dateField (both ascending). This query takes ~3-4 seconds to run.
However, if I change my query to either of the below, it executes within 100ms
Remove $ne
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true}}).sort({dateField:-1})
Remove $exists
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$ne:[]}}).sort({dateField:-1})
Remove sort
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true, $ne:[]}})
Use arrayField.0
db.getCollection('foodfulfilments').find({stringField:"stringValue", "arrayField.0":{$exists:true}}).sort({dateField:-1})
The explain of these queries do not provide any insights to why the first query is so slow?
MongoDb version 3.4.18

Related

MongoDB $nin query optimization

I have a MongoDB collection that has about 5 million documents. The data in the documents look something like the following:
{
since: [Some Unix timestamp ie. 1660106561)
team: ["a","b","c]
}
when I run the following query, my mongoDB connection times out:
db.myCollection.find({team: {$nin: ['b']}}).sort({since: -1})
I have the following compound indices on my collection:
[since, team]
Is there anything I can do to prevent this query from timing out?
This query is just very hard to execute, $nin is a none selective query, This means it does not utilize indexes well.
For instance, the inequality operators $nin and $ne are not very selective since they often match a large portion of the index. As a result, in many cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.
Specfically in this context i believe the index is even hurting the query performance, because the index is a multikey index using $nin on it still forces scanning large portions of the index tree, after that the query engine still has to fetch and filter each document to make sure it does not have the B value. performance can get even worse if this is just a toy example and the carnality of values is even greater. essentially forcing the engine to do double work.
I reckon that if you execute this query without any index usage, are alternatively if the "since" date matches the "narutal" order and you can drop the skip, you can get much better performance like this:
db.collection.find({team: {$nin: ['b']}}).sort({$natural: -1}).hint({$natural: 1});
Mind you, any query that needs to scan millions of documents will not have a lightning fast execution speed.

In mongodb, what does it mean for an index to support a query?

I have the following definition about indexes in mongodb:
An index supports a query when the index contains all the fields scanned by the query. The query scans the index and not the collection. Creating indexes that support queries results in greatly increased query performance.
Does it imply that an index is taken into account for a query execution ONLY if it contains ALL the fields requested by the query? So that, for example, if my query is searching for fields (a,b,c) and the only index in the collection was created on (b), it won't be used at all for the execution?
It depends on the query. From the Query Plans page:
For a query, the MongoDB query optimizer chooses and caches the most efficient query plan given the available indexes.
Implicit in that statement is that the query you submit may not be the query that is executed; MongoDB may rewrite your query in multiple ways during the evaluation process. Use cursor.explain() to view the query plans considered by MongoDB and see which was chosen to execute your specific query (and why it was chosen).
The diagram below is from version 4.0 of the Query Plans page but I think it does a good job illustrating the query planner logic.

How does the limit() option work in mongodb?

Let say you have a collection of 10,000 documents and I make a find query with a the option limit(50). How will mongoDb choose which 50 documents to return.
Will it auto-sort them(maybe by their creation date) or not?
Will the query return the same documents every time it is called? How does the limit option work in mongodb?
Does mongoDB limit the documents after they are returned or as it queries them. Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
The first 50 documents of the result set will be returned.
If you do not sort the documents (or if the order is not well-defined, such as sorting by a field with values that occur multiple times in the result set), the order may change from one execution to the next.
Will it auto-sort them(maybe by their creation date) or not?
No.
Will the query return the same documents every time it is called?
The query may produce the same results for a while and then start producing different results if, for example, another document is inserted into the collection.
Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
Depends on the query. If an index is used, only the needed documents will be read from the storage engine. If a sort stage is used in the query execution, all documents will be read from storage, sorted, then the required number will be returned and the rest discarded.

How I debug a mongodb slow regex query?

I have two simple queries on a collection of 22 millions documents.
query 1:
db.audits.find({"w.em": /^name.lastname/i})
return in less than 1 second.
query 2:
db.audits.find({"w.d": /^name.lastname/i})
it runs for more than 30seconds (and rightly not found results).
The only difference on the two queries is the field I am searching on. Both fields are indexed, you can find here the explain for both : it is identical!
Here the explains with executionStats
How can the queries perform so differently??
I am on mongodb 3.4.23

Implementation of limit in mongodb

My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.
Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.
The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.