How I debug a mongodb slow regex query? - mongodb

I have two simple queries on a collection of 22 millions documents.
query 1:
db.audits.find({"w.em": /^name.lastname/i})
return in less than 1 second.
query 2:
db.audits.find({"w.d": /^name.lastname/i})
it runs for more than 30seconds (and rightly not found results).
The only difference on the two queries is the field I am searching on. Both fields are indexed, you can find here the explain for both : it is identical!
Here the explains with executionStats
How can the queries perform so differently??
I am on mongodb 3.4.23

Related

In mongodb, what does it mean for an index to support a query?

I have the following definition about indexes in mongodb:
An index supports a query when the index contains all the fields scanned by the query. The query scans the index and not the collection. Creating indexes that support queries results in greatly increased query performance.
Does it imply that an index is taken into account for a query execution ONLY if it contains ALL the fields requested by the query? So that, for example, if my query is searching for fields (a,b,c) and the only index in the collection was created on (b), it won't be used at all for the execution?
It depends on the query. From the Query Plans page:
For a query, the MongoDB query optimizer chooses and caches the most efficient query plan given the available indexes.
Implicit in that statement is that the query you submit may not be the query that is executed; MongoDB may rewrite your query in multiple ways during the evaluation process. Use cursor.explain() to view the query plans considered by MongoDB and see which was chosen to execute your specific query (and why it was chosen).
The diagram below is from version 4.0 of the Query Plans page but I think it does a good job illustrating the query planner logic.

MongoDB: Slow query

I am running a mongo query like this
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true, $ne:[]}}).sort({dateField:-1})
The collection has approx. 10^6 documents. I have indexes on the stringField and dateField (both ascending). This query takes ~3-4 seconds to run.
However, if I change my query to either of the below, it executes within 100ms
Remove $ne
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true}}).sort({dateField:-1})
Remove $exists
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$ne:[]}}).sort({dateField:-1})
Remove sort
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true, $ne:[]}})
Use arrayField.0
db.getCollection('foodfulfilments').find({stringField:"stringValue", "arrayField.0":{$exists:true}}).sort({dateField:-1})
The explain of these queries do not provide any insights to why the first query is so slow?
MongoDb version 3.4.18

What's the difference between following two mongodb queries?

I ran following two queries and they returned different results.
// my query 1
> db.events.count({"startTimeUnix":{$lt:1533268800000},"startTimeUnix":{$gte:1533182400000}})
131
// existing app query 2
> db.events.count({"startTimeUnix":{"$lt":1533268800000,"$gte":1533182400000}})
0
The query 2 is already being used in the batch application but it reported to pulling less records which I confirmed from these queries.
//these counts are confusing
> db.events.count()
2781
> db.events.count({"startTimeUnix":{$lt:1533268800000}})
361
> db.events.count({"startTimeUnix":{$gte:1533182400000}})
2780
Use the second query. You can add explain() to find out the query plans. The first query
db.events.count({"startTimeUnix":{$lt:1533268800000},"startTimeUnix":{$gte:1533182400000}})
is evaluated the same as
db.events.explain().count({"startTimeUnix":{$gte:1533182400000}})
Use the command below to view the query plans.
db.events.explain().count({"startTimeUnix":{$lt:1533268800000},"startTimeUnix":{$gte:1533182400000}})
query 2 is an impicit (and proper) way of building AND condition, query 1 is incorrect in terms of MongoDB syntax. The way it gets analyzed is pretty simple, MongoDB takes first condtion and then overrides it with second one so it has the same meaning as:
db.events.count({"startTimeUnix":{$gte:1533182400000}})
first condition simply gets ignored and that's why you're getting more results (described here)
The problem is that mongo doesn't parse operators if the are in quotes.
db.events.count({"startTimeUnix":{"$lt":1533268800000,"$gte":1533182400000}})
means that it looks for the entries where startTimeUnix is an object and contains fields "$lt" and "$gte"
If you'll the next command, this query starts returning 1:
db.events.insert({"startTimeUnix":{"$lt":1533268800000,"$gte":1533182400000}})

Implementation of limit in mongodb

My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.
Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.
The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.

Where does compound indexes in mongodb come into play

What are the advantages we get from compound indexes. I mean suppose we have a collection, in which I have to index over 2 fields say key1 and key2. How different is it from having a compound index {key1:1, key2:1}. Whats the problem with having 2 separate indexes. Can't mongodb make use of 2 or more indexes to satisfy a query.
As at MongoDB 2.2:
Every query, including update operations, use one and only one index.
The query optimizer selects the index empirically by occasionally running alternate query plans and by selecting the plan with the best response time for each query type.
An exception to the above rule is $or queries; each clause is executed in parallel and can use a separate index.
For more information see:
Indexing Overview
Query Optimizer
Explain