Is there a way to track 'query' performance in MongoDB? Specially testing indexes or subdocuments?
In sql you can run queries, see execution time and other analytic metrics.
I have a huge mongoDB collection and want to try different variations and indexes, not sure how to test it, would be nice to see how long did it take to find a record.. (I am new in MongoDB). Thanks
There are two things here that you'll likely be familiar with.
Explain plans
Slow Logs
Explain Plans
Here are some basic docs on explain. Running explain is as simple as db.foo.find(query).explain(). (note that this actually runs the query, so if your query is slow this will be too)
To understand the output, you'll want to check some of the docs on the slow logs below. You're basically given details about "how much index was scanned", "how many are found", etc. As is the case with such performance details, interpretation is really up to you. Read the docs above and below to point you in the right direction.
Slow Logs
By default, slow logs are active with a threshold of 100ms. Here's a link to the full documentation on profiling. A couple of key points to get you started:
Get/Set profiling:
db.setProfilingLevel(2); // 0 => none, 1 => slow, 2 => all
db.getProfilingLevel();
See slow queries:
db.system.profile.find()
MongoDB has a query profiler you can turn on.
See: http://www.mongodb.org/display/DOCS/Database+Profiler
I recommend, based on my experience , to read mongologs with the mtools support ( https://github.com/rueckstiess/mtools ), there are lots of feature which help reading output .
For instance I found very useful the mlogFilter command e.g.:
mlogfilter.exe [PATH_TO_FILE_LOG] --slow --json | mongoimport --db logsTest -c logCollection --drop
Launching this command you will record all the slow queries in a table.
Activating mongoprofiler could be a problem for performance in production environment, so you can achieve the same results reading logs.
Consider also that the system.profile collection is capped , so you should check the size of the capped collection because otherwise you could not find the queries you're looking for.
Related
I'm new to Mongo and have searched but don't see a specific answer.
I understand that Mongo explain method will execute the query in parallel with possible access plans and choose a winning plan based on execution time.
The Best Practices Guide states "The query plan can be calculated and returned without first having to run the query". I cannot find how to do this in the doc.
So what if even the winning plan takes a very long time to execute before it returns even a small result set, for example sorting a large collection?
I've seen older comments that execution stops after the first 101 documents are returned, but again can't find that in official doc.
So my question is: How to get the access plan without executing the query?
Thanks for your help. I'm using Mongo 3.4.
You can set the verbosity of the explain() function. By default it uses the queryPlanner which is (if I'm right) what you are looking for.
Check the other modes in the official MongoDB documentation about explain()
MongoDB builds Query Plans by running a limited subset of a query, then caches the winning plan so it can be re-used (until something causes a cache flush).
So the "limited subset" means that it will get documents that match up to the 100 docs as a "sample", and go no further, but what the documentation means by
The query plan can be calculated and returned without first having to
run the query
is that you can do an explain before you run the full query. This is good practice because it then populates that Plan Cache for you ahead of time.
To put it simply, if MongoDB notices that another query is taking longer than a completed query, it will cease execution and ditch the plan in favour of the more efficient plan. For example, you can see this in action by running the allPlansExecution explain where an index is chosen over a colscan.
I am using mongodb via the mongo shell to query a large collection. For some reason after 90 seconds the mongo shell seems to be stopping my query and nothing is returned.
I have tried the following two commands but neither will return anything. After 90 seconds it just gives me a new line that I can type in another command.
db.cards.find("Field":"Something").maxTimeMS(9999999)
db.cards.find("Field":"Something").addOption(DBQuery.Option.tailable)
db.cards.find() return results, but anything with parameters is timing out at exactly 90 seconds and nothing is being returned.
Any help would be greatly appreciated.
Given the level of detail in your question, I am going to focus on 'query a large collection' and guess that your are using the MMAPv1 storage engine, with no index coverage on your query.
Are you disk bound?
Given the above assumptions, you could be cycling data between RAM and disk. Mongo has a default 100MB RAM limit, so if your query has to examine a lot of documents (no index coverage), paging data from disk to RAM could be the culprit. I have heard of mongo shell acting as you describe or locking/terminating when memory constraints are exceeded.
32bit systems can also impose severe memory limits for large collections.
You could look at your OS specific disk activity monitor to get a clue into whether this is your problem.
Just how large is your collection?
Next, how big is your collection? You can show collections and see the physical size of the collection and also db.cards.count() to see your record count. This helps quantify "large collection".
NOTE: you might need the mongo-hacker extensions to see collection disk use in show collections.
Mongo shell investigation
Within the mongo shell, you have a couple more places to look.
By default, mongo will log slow queries (> 100ms). After your 90 sec timeout:
db.adminCommand({getLog: "global" })
and look for slow query log entries.
Next look at your winning query plan.
var e = db.cards.explain()
e.find("Field":"Something")
I am guessing you will see
"stage": "COLLSCAN",
Which means you are doing a full collection scan and you need index coverage for your query (good idea for queries and sorts).
Suggestions
You should have at least partial index coverage on any production query. A proper index should solve your problem (assuming you don't have documents > 16MB).
Another approach (that I don't recommend - indexing is better) is to use a cursor instead
var cursor = db.cards.find("Field":"Something")
while (cursor.hasNext()) {
print(tojson(cursor.next()));
}
Depending on the root cause, this may work for you.
I like to get some knowledge on reindexing the MongoDB. Please forgive me as I am asking some subjective questions.
The questinon is : Do MongoDB needs to do reindexing periodically like we do for RDBMS or Mongo automatically manages it.
Thanks for your fedback
Mongodb takes care of indexes during routine updates. This operation
may be expensive for collections that have a large amount of data
and/or a large number of indexes.For most users, the reIndex command
is unnecessary. However, it may be worth running if the collection
size has changed significantly or if the indexes are consuming a
disproportionate amount of disk space.
Call reIndex using the following form:
db.collection.reIndex();
Reference : https://docs.mongodb.com/manual/reference/method/db.collection.reIndex/
That's a good question, because nowhere in the documentation does it mention explicitly that indexes are automatically maintained*. But, they are. You rarely need to reindex manually.
*I filed a bug for that, go vote for it :)
How does Meteor handle the process of DB indexing? I've read that there are none at this time but I'm particularly concerned with very large data sets, joined with multiple lookups, etc. and will really impact performance. Are these issues taken care of by Mongo and Meteor?
I am coming from a Rails/PostgreSQL background and am about 2 days into Meteor and Mongo.
Thanks.
Meteor does expose a method for creating indexes, which maps to the mongo method db.collection.ensureIndex
You can access it on each Meteor.Collection instance, on the server. For Example:
if (Meteor.isServer){
var myCollection = new Meteor.Collection("dummy");
// create an index on 'dummy', field1 & field2
myCollection._ensureIndex({field1: 1, field2: 1});
}
From a performance POV, create indexes based on what you publish, but avoid over-indexing.
With oplog tailing, the initial query will only run occasionally- and get changes from the oplog.
Without oplog tailing, meteor will re-run the query every 10s, so better indexes have a large gain.
Got a response from the Discover Meteor book folks:
Sacha Greif Mod − Actually, we are in the process of writing a new
sidebar to address migrations. You'll have access to it for free if
you're on the Full or Premium packages :)
Regarding indexes, I think we might address that in an upcoming blog
post :)
Thanks much for the reply. I'm looking forward to both.
I enabled profiling mode 2 (all events). My idea is to write a script, that will go through all the queries and execute explain plans and see whether any of the queries do not use indexes. However it's impossible to do so as I don't see any sorting information in the system.profile. Why so?
Thanks
UPDATE:
Imagine, you have a users collection. And you have created an index on this collection: user(name, createdAt). Now I would like to find out some users sorted by time. In system.profile the second part of the query (sorting/pagination) won't be saved, however it's very important to understand what sorting operation were used, as it affects the performance and index selection.
So my intention was to create a script that will go through all statements in system.profile and execute explain plans and see whether indexes are used or not. If not, I can automatically catch all new/not reliable queries while executing integration tests.
Why did 10gen choose to not include sorting information in the system.profile?
I'd suggest if you're interested in the feature, to request it here. (I couldn't find an existing suggestion for this feature). You might get more detail about why it hasn't been included.
I can see how it could be valuable sometimes, although it's recommended that you compare the nscanned vs the nreturned as a way of knowing whether indexes are being used (and more importantly, making sure as few documents are scanned as possible).
So, while not perfect, it gives you a quick way of evaluating performance and detecting index usage.
You can use the aggregation framework to sort
e.g. slowest operations first
db.system.profile.aggregate( { $sort : { millis : -1 }})
Sorry guys. I found an answer to that question. Actually mongo has it. It's called orderby inside the query element. My bad :-)