How to apply/choose getPlanCache() and hint() depending on different situation - mongodb

I already read official documentation to get the basic idea on getPlanCache() and hint().
getPlanCache()
Displays the cached query plans for the specified query shape.
The query optimizer only caches the plans for those query shapes that can have more than one viable plan.
Official Documentation: https://docs.mongodb.com/manual/reference/method/PlanCache.getPlansByQuery/
hint()
The $hint operator forces the query optimizer to use a specific index to fulfill the query. Specify the index either by the index name or by document.
Official Documentation: https://docs.mongodb.com/manual/reference/operator/meta/hint/
MyQuestion
If I can make sure the specific collection can cache the plan, I don't need to use hint() to ensure optimized performance. Is that correct?

I already read official documentation to get the basic idea on getPlanCache() and hint().
To be clear: these are troubleshooting aids for investigating query performance. The MongoDB query planner chooses the most efficent plan available based on a measure of "work" involved in executing a given query shape. If there is only a single viable plan, there is no need to cache the plan selection. If there are multiple query plans available for the same query shape, the query planner will periodically evaluate performance and update the cached plan selection if appropriate.
The query plan cache methods allow you to inspect and clear information in the plan cache. Generally you would only want to clear the plan cache while investigating issues in a development/staging environment as this could have a noticeable affect on a busy deployment.
If I can make sure the specific collection can cache the plan, I don't need to use hint() to ensure optimized performance. Is that correct?
In general you should avoid using hint (outside of testing query plans) as this bypasses the query planner and forces use of the hinted index even if there might be a more efficient index available.
If a specific query is not performing as expected, explain() output is the best starting point for insight into the query planning process. If you're not sure how to optimise a specific query, I'd suggest posting a question on DBA StackExchange including the output of explain(true) (verbose explain) and your MongoDB server version.
For a helpful presentation, see: Reading the .explain() Output - Charlie Swanson (June 2017).

Related

Does mongodb use index search in lookup stage?

I'm querying a collection with aggregate function in MongoDB and I have to look up some other collections in its aggregation. But I have a question about it:
Does MongoDB use indexes for foreignField? I wasn't able to figure this out and I searched
everywhere for this but I didn't get my answer. It must certainly use indexes for it but I just want to be sure.
The best way to determine how the database is executing a query is to generate and examine the explain output for the operation. With aggregations that include the $lookup stage specifically you will want to use the more verbose .explain("executionStats") mode. You may also utilize the $indexStats operator to confirm that the usage count of the intended index is increasing.
The best answer we can give based on the limited information in the question is: MongoDB will probably use the index. Query execution behavior, including index usage, depends on the situation and the version. If you provide more information in your question, then we can provide more specific information. There is also some details about index usage on the $lookup documentation page.

How reliable is MongoDB's query optimizer?

According to MongoDB docs:
For a query, the MongoDB query optimizer chooses and caches the most
efficient query plan given the available indexes.
So if the query optimizer chooses one index over the other according to indexStats, is this a good enough evaluation to delete the unused index and keep only the preferred one?
Or are the edge cases where it makes sense to keep the index that is not preferred by the query optimizer and delete the preferred one?

Mongo - How to explain query without executing it

I'm new to Mongo and have searched but don't see a specific answer.
I understand that Mongo explain method will execute the query in parallel with possible access plans and choose a winning plan based on execution time.
The Best Practices Guide states "The query plan can be calculated and returned without first having to run the query". I cannot find how to do this in the doc.
So what if even the winning plan takes a very long time to execute before it returns even a small result set, for example sorting a large collection?
I've seen older comments that execution stops after the first 101 documents are returned, but again can't find that in official doc.
So my question is: How to get the access plan without executing the query?
Thanks for your help. I'm using Mongo 3.4.
You can set the verbosity of the explain() function. By default it uses the queryPlanner which is (if I'm right) what you are looking for.
Check the other modes in the official MongoDB documentation about explain()
MongoDB builds Query Plans by running a limited subset of a query, then caches the winning plan so it can be re-used (until something causes a cache flush).
So the "limited subset" means that it will get documents that match up to the 100 docs as a "sample", and go no further, but what the documentation means by
The query plan can be calculated and returned without first having to
run the query
is that you can do an explain before you run the full query. This is good practice because it then populates that Plan Cache for you ahead of time.
To put it simply, if MongoDB notices that another query is taking longer than a completed query, it will cease execution and ditch the plan in favour of the more efficient plan. For example, you can see this in action by running the allPlansExecution explain where an index is chosen over a colscan.

How to see sort information system.profile (mongodb)?

I enabled profiling mode 2 (all events). My idea is to write a script, that will go through all the queries and execute explain plans and see whether any of the queries do not use indexes. However it's impossible to do so as I don't see any sorting information in the system.profile. Why so?
Thanks
UPDATE:
Imagine, you have a users collection. And you have created an index on this collection: user(name, createdAt). Now I would like to find out some users sorted by time. In system.profile the second part of the query (sorting/pagination) won't be saved, however it's very important to understand what sorting operation were used, as it affects the performance and index selection.
So my intention was to create a script that will go through all statements in system.profile and execute explain plans and see whether indexes are used or not. If not, I can automatically catch all new/not reliable queries while executing integration tests.
Why did 10gen choose to not include sorting information in the system.profile?
I'd suggest if you're interested in the feature, to request it here. (I couldn't find an existing suggestion for this feature). You might get more detail about why it hasn't been included.
I can see how it could be valuable sometimes, although it's recommended that you compare the nscanned vs the nreturned as a way of knowing whether indexes are being used (and more importantly, making sure as few documents are scanned as possible).
So, while not perfect, it gives you a quick way of evaluating performance and detecting index usage.
You can use the aggregation framework to sort
e.g. slowest operations first
db.system.profile.aggregate( { $sort : { millis : -1 }})
Sorry guys. I found an answer to that question. Actually mongo has it. It's called orderby inside the query element. My bad :-)

Is mongoDB efficient in doing multi-key lookups?

I'm evaluating MongoDB, coming from Membased/memcached because I want more flexibility.
Of course Membase is excellent in doing fast (multi)-key lookups.
I like the additional options that MongoDB gives me, but is it also fast in doing multi-key lookups? I've seen the $or and $in operator and I'm sure I can model it with that. I just want to know if it's performant (in the same league) as Membase.
use-case, e.g., Lucene/Solr returns 20 product-ids. Lookup these product-ids in Couchdb to return docs/ appropriate fields.
Thanks,
Geert-Jan
For your use case, I'd say it is, from my experience: I hacked some analytics into a database of mine that made a lot of $in queries with thousands of ids and it worked fine (it was a hack). To my surprise, it worked rather well, in the lower millisecond area.
Of course, it's hard to compare this, and -as usual- theory is a bad companion when it comes to performance. I guess the best way to figure it out is to migrate some test data and send some queries to the system.
Use MongoDB's excellent built-in profiler, use $explain, keep the one index per query rule in mind, take a look at the logs, keep an eye on mongostat, and do some benchmarks. This shouldn't take too long and give you a definite and affirmative answer. If your queries turn out slow, people here and on the news group probably have some ideas how to improve the exact query, or the indexation.
One index per query. It's sometimes thought that queries on multiple
keys can use multiple indexes; this is not the case with MongoDB. If
you have a query that selects on multiple keys, and you want that
query to use an index efficiently, then a compound-key index is
necessary.
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-Oneindexperquery.
There's more information on that page as well with regard to Indexes.
The bottom line is Mongo will be great if your indexes are in memory and you are indexing on the columns you want to query using composite keys. If you have poor indexing then your performance will suffer as a result. This is pretty much in line with most systems.