Mongo - How to explain query without executing it - mongodb

I'm new to Mongo and have searched but don't see a specific answer.
I understand that Mongo explain method will execute the query in parallel with possible access plans and choose a winning plan based on execution time.
The Best Practices Guide states "The query plan can be calculated and returned without first having to run the query". I cannot find how to do this in the doc.
So what if even the winning plan takes a very long time to execute before it returns even a small result set, for example sorting a large collection?
I've seen older comments that execution stops after the first 101 documents are returned, but again can't find that in official doc.
So my question is: How to get the access plan without executing the query?
Thanks for your help. I'm using Mongo 3.4.

You can set the verbosity of the explain() function. By default it uses the queryPlanner which is (if I'm right) what you are looking for.
Check the other modes in the official MongoDB documentation about explain()

MongoDB builds Query Plans by running a limited subset of a query, then caches the winning plan so it can be re-used (until something causes a cache flush).
So the "limited subset" means that it will get documents that match up to the 100 docs as a "sample", and go no further, but what the documentation means by
The query plan can be calculated and returned without first having to
run the query
is that you can do an explain before you run the full query. This is good practice because it then populates that Plan Cache for you ahead of time.
To put it simply, if MongoDB notices that another query is taking longer than a completed query, it will cease execution and ditch the plan in favour of the more efficient plan. For example, you can see this in action by running the allPlansExecution explain where an index is chosen over a colscan.

Related

How to apply/choose getPlanCache() and hint() depending on different situation

I already read official documentation to get the basic idea on getPlanCache() and hint().
getPlanCache()
Displays the cached query plans for the specified query shape.
The query optimizer only caches the plans for those query shapes that can have more than one viable plan.
Official Documentation: https://docs.mongodb.com/manual/reference/method/PlanCache.getPlansByQuery/
hint()
The $hint operator forces the query optimizer to use a specific index to fulfill the query. Specify the index either by the index name or by document.
Official Documentation: https://docs.mongodb.com/manual/reference/operator/meta/hint/
MyQuestion
If I can make sure the specific collection can cache the plan, I don't need to use hint() to ensure optimized performance. Is that correct?
I already read official documentation to get the basic idea on getPlanCache() and hint().
To be clear: these are troubleshooting aids for investigating query performance. The MongoDB query planner chooses the most efficent plan available based on a measure of "work" involved in executing a given query shape. If there is only a single viable plan, there is no need to cache the plan selection. If there are multiple query plans available for the same query shape, the query planner will periodically evaluate performance and update the cached plan selection if appropriate.
The query plan cache methods allow you to inspect and clear information in the plan cache. Generally you would only want to clear the plan cache while investigating issues in a development/staging environment as this could have a noticeable affect on a busy deployment.
If I can make sure the specific collection can cache the plan, I don't need to use hint() to ensure optimized performance. Is that correct?
In general you should avoid using hint (outside of testing query plans) as this bypasses the query planner and forces use of the hinted index even if there might be a more efficient index available.
If a specific query is not performing as expected, explain() output is the best starting point for insight into the query planning process. If you're not sure how to optimise a specific query, I'd suggest posting a question on DBA StackExchange including the output of explain(true) (verbose explain) and your MongoDB server version.
For a helpful presentation, see: Reading the .explain() Output - Charlie Swanson (June 2017).

MongoDB Aggregation V/S simple query performance?

I am reasking this question as i thought this question should be on seperate thread from this one in-mongodb-know-index-of-array-element-matched-with-in-operator.
I am using mongoDB and actually i was writing all of my queries using simple queries which are find, update etc. (No Aggregations). Now i read on many SO posts see this one for example mongodb-aggregation-match-vs-find-speed. Now i thought about why increasing computation time on server because as if i will compute more then my server load will become more, so i tried to use aggregations and i thought i am going in right direction now. But later on my previous question andreas-limoli told me about not using aggregations as it is slow and for using simple queries and computing on server. Now literally i am in a delimma about what should i use, i am working with mongoDB from a year now but i don't have any knowledge about its performance when data size increases so i completely don't know which one should i pick.
Also one more thing i didn't find on anywhere, if aggregation is slower than is it because of $lookup or not, because $lookup is the foremost thing i thought about using aggregation because otherwise i have to execute many queries serially and then compute on server which appears to me very poor in front of aggregation.
Also i read about 100MB restriction on mongodb aggregation when passing data from one pipeline to other, so how people handle that case efficiently and also if they turn on Disk usage then because Disk usage slow down everything than how people handle that case.
Also i fetched 30,000 sample collection and tried to run aggregation with $match and find query and i found that aggregation was little bit faster than find query which was aggregation took 180ms to execute where as find took 220 ms to execute.
Please help me out guys please it would be really helpful for me.
Aggregation pipelines are costly queries. It might impact on your performance as an increasing data because of CPU memory. If you can achieve the with find query, go for it because Aggregation is costlier once DB data increases.
Aggregation framework in MongoDB is similar to join operations in SQL. Aggregation pipelines are generally resource intensive operations. So if in case your work is satisfied with simple queries, you should use that one at first place.
However, if it is absolute necessary then you can use aggregation pipelines in case you need to fetch the data from the multiple collections.

How to see sort information system.profile (mongodb)?

I enabled profiling mode 2 (all events). My idea is to write a script, that will go through all the queries and execute explain plans and see whether any of the queries do not use indexes. However it's impossible to do so as I don't see any sorting information in the system.profile. Why so?
Thanks
UPDATE:
Imagine, you have a users collection. And you have created an index on this collection: user(name, createdAt). Now I would like to find out some users sorted by time. In system.profile the second part of the query (sorting/pagination) won't be saved, however it's very important to understand what sorting operation were used, as it affects the performance and index selection.
So my intention was to create a script that will go through all statements in system.profile and execute explain plans and see whether indexes are used or not. If not, I can automatically catch all new/not reliable queries while executing integration tests.
Why did 10gen choose to not include sorting information in the system.profile?
I'd suggest if you're interested in the feature, to request it here. (I couldn't find an existing suggestion for this feature). You might get more detail about why it hasn't been included.
I can see how it could be valuable sometimes, although it's recommended that you compare the nscanned vs the nreturned as a way of knowing whether indexes are being used (and more importantly, making sure as few documents are scanned as possible).
So, while not perfect, it gives you a quick way of evaluating performance and detecting index usage.
You can use the aggregation framework to sort
e.g. slowest operations first
db.system.profile.aggregate( { $sort : { millis : -1 }})
Sorry guys. I found an answer to that question. Actually mongo has it. It's called orderby inside the query element. My bad :-)

mongodb get count without repeating find

When performing a query in MongoDb, I need to obtain a total count of all matches, along with the documents themselves as a limited/paged subset.
I can achieve the goal with two queries, but I do not see how to do this with one query. I am hoping there is a mongo feature that is, in some sense, equivalent to SQL_CALC_FOUND_ROWS, as it seems like overkill to have to run the query twice. Any help would be great. Thanks!
EDIT: Here is Java code to do the above.
DBCursor cursor = collection.find(searchQuery).limit(10);
System.out.println("total objects = " + cursor.count());
I'm not sure which language you're using, but you can typically call a count method on the cursor that's the result of a find query and then use that same cursor to obtain the documents themselves.
It's not only overkill to run the query twice, but there is also the risk of inconsistency. The collection might change between the two queries, or each query might be routed to a different peer in a replica set, which may have different versions of the collection.
The count() function on cursors (in the MongoDB JavaScript shell) really runs another query, You can see that by typing "cursor.count" (without parentheses), so it is not better than running two queries.
In the C++ driver, cursors don't even have a "count" function. There is "itcount", but it only loops over the cursor and fetches all results, which is not what you want (for performance reasons). The Java driver also has "itcount", and there the documentation says that it should be used for testing only.
It seems there is no way to do a "find some and get total count" consistently and efficiently.

Track MongoDB performance?

Is there a way to track 'query' performance in MongoDB? Specially testing indexes or subdocuments?
In sql you can run queries, see execution time and other analytic metrics.
I have a huge mongoDB collection and want to try different variations and indexes, not sure how to test it, would be nice to see how long did it take to find a record.. (I am new in MongoDB). Thanks
There are two things here that you'll likely be familiar with.
Explain plans
Slow Logs
Explain Plans
Here are some basic docs on explain. Running explain is as simple as db.foo.find(query).explain(). (note that this actually runs the query, so if your query is slow this will be too)
To understand the output, you'll want to check some of the docs on the slow logs below. You're basically given details about "how much index was scanned", "how many are found", etc. As is the case with such performance details, interpretation is really up to you. Read the docs above and below to point you in the right direction.
Slow Logs
By default, slow logs are active with a threshold of 100ms. Here's a link to the full documentation on profiling. A couple of key points to get you started:
Get/Set profiling:
db.setProfilingLevel(2); // 0 => none, 1 => slow, 2 => all
db.getProfilingLevel();
See slow queries:
db.system.profile.find()
MongoDB has a query profiler you can turn on.
See: http://www.mongodb.org/display/DOCS/Database+Profiler
I recommend, based on my experience , to read mongologs with the mtools support ( https://github.com/rueckstiess/mtools ), there are lots of feature which help reading output .
For instance I found very useful the mlogFilter command e.g.:
mlogfilter.exe [PATH_TO_FILE_LOG] --slow --json | mongoimport --db logsTest -c logCollection --drop
Launching this command you will record all the slow queries in a table.
Activating mongoprofiler could be a problem for performance in production environment, so you can achieve the same results reading logs.
Consider also that the system.profile collection is capped , so you should check the size of the capped collection because otherwise you could not find the queries you're looking for.