How to measure the query run time in MongoDB - mongodb

I am trying to measure the query run time in MongoDB.
Steps:
I set the profiling in mongoDB and ran my query
When I did show Profile I got the below output.
db.blogpost.find({post:/.* NATO .*/i})
blogpost is the collection name, I searched for "NATO" keyword in query.
Output: It pulled out 20 records and after running the query to get execution results, I got the below output:
In the output I can see 3 time values, which one is similar to duration time in MySQL ?
query blogtrackernosql.blogpost **472ms** Wed Apr 11 2018 20:37:54
command:{
"find" : "blogpost",
"filter" : {
"post" : /.* NATO .*/i
},
"$db" : "blogtrackernosql"
} cursorid:99983342073 keysExamined:0 docsExamined:1122 numYield:19 locks:{
"Global" : {
"acquireCount" : {
"r" : NumberLong(40)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(20)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(20)
}
}
} nreturned:101 responseLength:723471 protocol:op_msg planSummary:COLLSCAN
execStats:{
**"stage"** : "COLLSCAN",
"filter" : {
"post" : {
"$regex" : ".* NATO .*",
"$options" : "i"
}
},
"nReturned" : 101,
**"executionTimeMillisEstimate" : 422**,
"works" : 1123,
"advanced" : 101,
"needTime" : 1022,
"needYield" : 0,
"saveState" : 20,
"restoreState" : 19,
"isEOF" : 0,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 1122
} client:127.0.0.1 appName:MongoDB Shell allUsers:[ ] user:

This ...
"executionTimeMillisEstimate" : 422
... is MongoDB's estimation of how long that query will take to execute on the MongoDB server.
This ...
query blogtrackernosql.blogpost 472ms
... must be the end-to-end time including some client side piece (e.g. forming the query and sending it to the MongoDB server) plus the data transfer time from the MongoDB server back to your client.
So:
472ms is the total start-to-finish time
422ms is the time spent inside the MongoDb server
Note: the ouput also tells you that MongoDB has to scan the entire collection ("stage": "COLLSCAN") to perform this query. FWIW, the reason it has to scan the collection is that you are using a case insensitive $regex. According to the docs:
Case insensitive regular expression queries generally cannot use indexes effectively.

Related

Mongoose is querying secondary instead of primary server

For some unknown reason, mongoose is querying my secondary MongoDB server, and I can't figure out how to change that.
I've set db.setProfilingLevel(2) on my secondary server, and I see a lot of queries there for no reason.
When I view the records, I see:
"command" : {
"$readPreference" : {
"mode" : "secondaryPreferred"
}
}
Which is odd because according to the documentation, the default read preference should be primary.
When I run db.getMongo().getReadPref() I see that indeed that's the case:
ReadPreference {
mode: 'primary',
tags: undefined,
hedge: undefined,
maxStalenessSeconds: undefined,
minWireVersion: undefined
}
I also tried adding {readPreference: 'primary'} to my mongoose connection, but the issue remains the same.
Any suggestions where the secondaryPreferred setting might be coming from?
(I am not sure if my issue is with mongoose or MongoDB, so I've tagged them both)
Update
A full entry from the profiler on the SECONDARY server:
{
"op" : "query",
"ns" : "***",
"command" : {
"find" : "***",
"batchSize" : 1,
"singleBatch" : true,
"maxTimeMS" : 1000,
"$readPreference" : {
"mode" : "secondaryPreferred"
},
"readConcern" : {
"level" : "local"
},
"$db" : "***"
},
"keysExamined" : 0,
"docsExamined" : 1,
"cursorExhausted" : true,
"numYield" : 0,
"nreturned" : 1,
"queryHash" : "17830885",
"queryExecutionEngine" : "classic",
"locks" : {
"FeatureCompatibilityVersion" : {
"acquireCount" : {
"r" : NumberLong(1)
}
},
"Global" : {
"acquireCount" : {
"r" : NumberLong(1)
}
},
"Mutex" : {
"acquireCount" : {
"r" : NumberLong(1)
}
}
},
"flowControl" : {},
"readConcern" : {
"level" : "local",
"provenance" : "clientSupplied"
},
"responseLength" : 0,
"protocol" : "op_msg",
"millis" : 0,
"planSummary" : "COLLSCAN",
"execStats" : {
"stage" : "COLLSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 1,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 0,
"direction" : "forward",
"docsExamined" : 1
},
"ts" : ISODate("2022-10-01T19:00:03.842+07:00"),
"client" : "***", //IP address of the PRIMARY server
"allUsers" : [],
"user" : ""
}
Update 2
I can't replicate the issue on dev environment, so I'm guessing it's not a mongoose issue but something related to the servers setup.
Update 3
When looking at the profiler log again, I noticed that the client is the PRIMARY server IP, and not the app server.
Update 3
When looking at the profiler log again, I noticed that the client is the PRIMARY server IP, and not the app server.
This is super helpful information and what I was attempting to ask about in my comment.
Based on this, I suspect what is happening here is that this profiler entry is associated with a Mirrored Read. Borrowing some from the documentation:
Mirrored reads reduce the impact of primary elections following an outage or planned maintenance. After a failover in a replica set, the secondary that takes over as the new primary updates its cache as new queries come in. While the cache is warming up performance can be impacted.
Starting in version 4.4, mirrored reads pre-warm the caches of
electable secondary replica set members. To pre-warm the caches of electable secondaries, the primary mirrors a sample of the supported operations it receives to electable secondaries.
One way to quickly prove or disprove this hypothesis would be to disable mirrored reads in the production environment. Instructions for doing so can be found here and it involves setting the samplingRate to 0.0.
Overall what you are observing is probably expected behavior. It has only become visible because you are inspecting the profiler that includes all operations and therefore is not something to be concerned about. It sounds like the application itself is configured appropriately and using the primary read preference as designed.

Under what circumstances would mongo use a compond index for a query that does not match the prefix fields of the index?

When explaining a query on a collection having these indexes:
{"user_id": 1, "req.time_us": 1}
{"user_id": 1, "req.uri":1, "req.time_us": 1}
with command like:
db.some_collection.find({"user_id":12345,"req.time_us":{"$gte":1657509059545812,"$lt":1667522903018337}}).limit(20).explain("executionStats")
The winning plan was:
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 20,
"executionTimeMillisEstimate" : 0,
"works" : 20,
"advanced" : 20,
...
"keyPattern" : {
"user_id" : 1,
"req.uri" : 1,
"req.time_us" : 1
},
"indexName" : "user_id_1_req.uri_1_req.time_us_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"user_id" : [ ],
"req.uri" : [ ],
"req.time_us" : [ ]
},
...
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"user_id" : [
"[23456.0, 23456.0]"
],
"req.uri" : [
"[MinKey, MaxKey]"
],
"req.time_us" : [
"[1657509059545812.0, 1667522903018337.0)"
]
},
"keysExamined" : 20,
"seeks" : 1,
...
}
Why was the index user_id_1_req.uri_1_req.time_us_1 used but not user_id_1_req.time_us_1? Since the official manual says a compound index can supports queries that match the prefix fields of the index.
This behavior can be explained in the docs documentation page. To paraphrase:
MongoDB runs the query optimizer to choose the winning plan and executes the winning plan to completion.
During plan selection, if there are more than one index that can satisfy a query, MongoDB will run a trial using all the valid plans to determine which one performed to be the best. You can read about this process more here.
As of MongoDB 3.4.6, the plan selection involves running candidate plans in parallel in a "race", and see which candidate plan returns 101 results first.
So basically these 2 indexes had a mini competition and the "wrong" index one, this can happen as these competitions can be heavily skewed depending on data distribution for similar indexes.
( For example imagine the first 101 documents in the collection match the query then the "better" index will actually be slower as it will continue to scan the index tree deeper while the "worse" index start fetching them immediately)
I recommend for cases like this to use $hint which essentially forces Mongo to use the index you deem most fit.

AWS DocumentDB does not use indexes when $sort and $match at the same time

DocumentDB ignores indexes of any field instead of sorted
db.requests.aggregate([
{ $match: {'DeviceId': '5f68c9c1-73c1-e5cb-7a0b-90be2f80a332'}},
{ $sort: { 'Timestamp': 1 } }
])
Useful information:
> explain('executionStats')
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "admin_portal.requests",
"winningPlan" : {
"stage" : "IXSCAN",
"indexName" : "Timestamp_1",
"direction" : "forward"
}
},
"executionStats" : {
"executionSuccess" : true,
"executionTimeMillis" : "398883.755",
"planningTimeMillis" : "0.274",
"executionStages" : {
"stage" : "IXSCAN",
"nReturned" : "20438",
"executionTimeMillisEstimate" : "398879.028",
"indexName" : "Timestamp_1",
"direction" : "forward"
}
},
"serverInfo" : {
...
},
"ok" : 1.0,
"operationTime" : Timestamp(1622585939, 1)
}
> db.requests.getIndexKeys()
[
{
"_id" : 1
},
{
"Timestamp" : 1
},
{
"DeviceId" : 1
}
]
It works fine when I query documents without sorting or when I use find and sort function instead of aggregation.
Important note: Also it works perfect on original MongoDB instance, but not on the DocumentDB
This is more of "how does DocumentDB choose a query plan" kind of question.
There are many answers on how Mongo does it on stackoverflow.
Clearly choosing the "wrong" index can happen from failed trials based on data distribution, the issue here is that DocumentDB adds an unknown layer.
Amazon DocumentDB emulates the MongoDB 4.0 API on a purpose-built database engine that utilizes a distributed, fault-tolerant, self-healing storage system. As a result, query plans and the output of explain() may differ between Amazon DocumentDB and MongoDB. Customers who want control over their query plan can use the $hint operator to enforce selection of a preferred index.
They state that due to this layer differences might happen.
So now that we understand why a wrong index is selected ( kind of ). what can we do? well unless you want to drop or rebuilt your indexes differently somehow then you need to use the hint options for your pipeline.
db.collection.aggregate(pipeline, {hint: "index_name"})

Can I batch aggregations in MongoDB?

I have some readonly aggregate pipelines that must be runned in parallel with only one connection available. Is that possible or Mongo allows to only have find, update operations in bulk but not aggregate?
Mongodb driver uses connections pool and executes aggregation commands asynchronously. You don't need to do anything special, apart from ensure your application doesn't wait for responses before executing next query.
Consider a test collection:
mgeneratejs '{"num": {"$integer": {"min": 1, "max": 20}}, "text": {"$paragraph": {sentences: 5}}}' -n 100000 | mongoimport -d so -c text
a single aggregation query
db.text.aggregate([
{$match: {text: /ert.*duv/i}},
{$group:{_id:null, cnt:{$sum:1}, text:{$push: "$text"}}}
]);
takes circa 400 millis.
Running 10 of these in parallel (javascript):
const started = new Date().getTime();
let db;
MongoClient.connect(url, {poolSize: 10})
.then(cl =>{
db = cl.db('so');
return Promise.all([/ert.*duv/i, /kkd.*aql/i, /zop/i, /bdgtter/i, /ppa.*mcm/i, /ert.*duv/i, /kkd.*aql/i, /zop/i, /bdgtter/i, /ppa.*mcm/i]
.map(regex=>([{$match: {text: regex}}, {$group:{_id:null, cnt:{$sum:1}, text:{$push: "$text"}}}]))
.map(pipeline=>db.collection('text').aggregate(pipeline).toArray()))
})
.then(()=>{db.close(); console.log("ended in " + ( new Date().getTime() - started))});
takes 1,883 millis (javascript time), of which ~1,830 are on the db side:
db.getCollection('system.profile').find({ns:"so.text", "command.aggregate": "text"}, {ts:1, millis:1})
{
"millis" : 442,
"ts" : ISODate("2018-02-22T17:32:39.738Z")
},
{
"millis" : 452,
"ts" : ISODate("2018-02-22T17:32:39.747Z")
},
{
"millis" : 445,
"ts" : ISODate("2018-02-22T17:32:39.756Z")
},
{
"millis" : 471,
"ts" : ISODate("2018-02-22T17:32:39.762Z")
},
{
"millis" : 448,
"ts" : ISODate("2018-02-22T17:32:39.771Z")
},
{
"millis" : 491,
"ts" : ISODate("2018-02-22T17:32:39.792Z")
},
{
"millis" : 566,
"ts" : ISODate("2018-02-22T17:32:39.854Z")
},
{
"millis" : 561,
"ts" : ISODate("2018-02-22T17:32:39.856Z")
},
{
"millis" : 1822,
"ts" : ISODate("2018-02-22T17:32:41.118Z")
},
{
"millis" : 1834,
"ts" : ISODate("2018-02-22T17:32:41.124Z")
}
If you do the math you see all 10 started at about same time 2018-02-22T17:32:39.300Z, and mongostat indeed shows 10 more connections at the time of script execution.
Limiting poolSize to 5 doubles the time, as the requests will be executed in 2 batches of 5.
Driver uses about 1Mb RAM per connection, so 100 connections per worker is not something unreal.
To summarise - ensure you have connections pool configured properly, check number of connections actually used runtime, check you handle requests asynchronously on the application level.

MongoDB optimization

I need to optimize my MongoDB performance but can't figure out how. Maybe there's some tips. Or maybe i should use another storage engine. Any ideas are welcome.
I have following log output in which are described query:
2015-08-04T15:09:56.226+0300 [conn129682] command mongodb_db1.$cmd command: aggregate { aggregate: "collection", pipeline: [ { $match: { _id.index_id_1: 4931359 } } ] } keyUpdates:0 numYields:39 locks(micros) r:83489 reslen:177280 286ms
I have collection named collection which contains following data structure:
{
"_id" : {
"x" : "x",
"index_id_1" : NumberLong(5617088)
},
"value" : {
"value_1" : 1.0000000000000000,
"value_2" : 0.0000000000000000,
"value_3" : 1.0000000000000000
}
}
By querying stats in result i have following details:
{
"ns" : "mongodb_db1.collection",
"count" : 2.07e+007,
"size" : 4968000000.0000000000000000,
"avgObjSize" : 240,
"storageSize" : 5524459408.0000000000000000,
"numExtents" : 25,
"nindexes" : 3,
"lastExtentSize" : 5.36601e+008,
"paddingFactor" : 1.0000000000000000,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 4475975728.0000000000000000,
"indexSizes" : {
"_id_" : 2884043120.0000000000000000,
"_id.x.index_id_1" : 1.07118e+009,
"_id.index_id_1" : 5.20754e+008
},
"ok" : 1.0000000000000000
}
Running on single node ( no shards ).
MongoDB version is: 2.4.
Installed RAM (MB): 24017 ( index size ~120GB )
10Gen / Mongodb are running a series of FREE online courses, that cover all you need to know (Latest iteration starts today). Simply head over and sign up for the DBA course, and if your feeling brave a couple of the others, but there is a lot of common / duplicated material, between all variants at the beginning.