What is "locks(micros) w:16035" and "locks(micros) r:10051" in the mongodb log - mongodb

I have enabled profiling in the MongoDb config file.
profile=2
slowms=5
The mongodb log contains all the queries that took longer than 5 milliseconds (weird, I thought profile=2 meant log ALL queries).
For all update entries, the line ends with locks(micros) w:17738 17ms (the actual number varies). For all the query entries, the line contains locks(micros) r:15208
Sample line
Tue Dec 03 02:29:43.084 [conn11] update DbName.CollectionName query: { _id: ObjectId('51dfd2791bbdbe0b44395553')} update: { json for new document } nscanned:1 nmoved:1 nupdated:1 keyUpdates:0 locks(micros) w:17738 17ms
Reading the docs, I found the following section,
system.profile.lockStats
New in version 2.2.
The time in microseconds the operation spent acquiring and holding locks. This field reports data for the following lock types:
R - global read lock
W - global write lock
r - database-specific read lock
w - database-specific write lock
Okay, so the r & w are some database-specific lock times. But which one? Is it time spent holding the lock or time spent waiting to acquire a lock?

profile=2
slowms=5
The mongodb log contains all the queries that took longer than 5 milliseconds (weird, I thought profile=2 meant log ALL queries).
Setting profile to level 2 means that all queries are included in the system.profile capped collection irrespective of the slowms value. This does not affect what queries will be included in the mongod log.
Setting slowms to 5ms defines the threshold for slow queries that will be logged (irrespective of profiling) and included in the system.profile collection if profile is level 1 (i.e. profile slow queries).
If you want to see queries in your logs as well, you can increase the loglevel to 1 or higher:
db.adminCommand( { setParameter: 1, logLevel: 1 } )
WARNING: increased log levels get very noisy, and the logs are not capped like the system.profile collection!
Okay, so the r & w are some database-specific lock times. But which one? Is it time spent holding the lock or time spent waiting to acquire a lock?
For the system.profile.lockStats there are separate fields for acquiring (timeAcquiringMicros) and holding (timeLockedMicros) locks.
The "locks(micros)" details in the log are only showing the timeLockedMicros details (source reference: db/lockstat.cpp.
I was hoping for a link that mentions what the various fields of the log file are
I'm not aware of any detailed documentation for the log file format, and there are definitely some variations between major MongoDB releases. A great open source toolkit for working with MongoDB log files is mtools. You could peek into the code there to see how it parses different log lines.

Related

MongoDB - Find() by _id is taking so long - planSummary:IDHACK (timeAcquiringMicros is equal to query time)

I would like to know why the query below is taking so long (21 seconds) to execute even though the collection just have one document. I have a replicaset PSA instance with 130 databases giving a total of 500K files between collections and indexes (350GB). The Linux server has 32GB RAM and 8 CPUs, but we are not having IO and CPU bound. I'm using MongoDB 3.2 with Wiredtiger engine.
What is the relation between timeAcquiringMicros and the query time?
2019-10-03T11:30:34.249-0300I COMMAND[
conn370659
]command bd01.000000000000000000000000 command:find{
find:"000000000000000000000000",
filter:{
_id:ObjectId('000000000000000000000006')
},
batchSize:300
}planSummary:IDHACK
keysExamined:1
docsExamined:1
idhack:1
cursorExhausted:1
keyUpdates:0
writeConflicts:0
numYields:0
nreturned:1
reslen:102226
locks:{
Global:{
acquireCount:{
r:2
}
},
Database:{
acquireCount:{
r:1
},
acquireWaitCount:{
r:1
},
timeAcquiringMicros:{
r:21893874
}
},
Collection:{
acquireCount:{
r:1
}
}
}protocol:op_query 21894ms
MongoDB uses multiple granularity locking to help improve parallelism. There are some operations that need to lock on the Global, Database, or Collection level.
Your query you see several acquireCount: { r: X } the r is attempting to obtain an "intent shared lock" which is just a way of saying I don't need to lock anyone out, but I want to obtain a loose lock on a read at each of these levels before I get to the level I need. This prevents your query from executing while their are exclusive writes happening at any level you need to go through.
Importantly for you, you saw this:
Database:{
acquireCount:{
r:1
},
acquireWaitCount:{
r:1
},
timeAcquiringMicros:{
r:21893874
}
Meaning it took 21893874 milliseconds to acquire the lock you wanted at the Database level. Your query made it through the Global level, but was blocked by something that had an exclusive lock at the Dattabase level. I recommend becoming acquainted with this table in the MongoDB documentation: What locks are taken by some common client operations?.
One hypothesis in your situation is that someone had decided to create an index in the foreground on the database. This is interesting because you usually create indices on collections, but you need an exclusive write lock on the database which locks all your collections in that database.
Another hypothesis is that your database is just under heavy load. You'll need something like Percona's MongoDB Prometheus Exporter to scrape data from your database, but if you are able to get the data another way these two blog posts can help you understand your performance bottleneck.
Percona Monitoring and Management (PMM) Graphs Explained: WiredTiger and Percona Memory Engine
Percona Monitoring and Management (PMM) Graphs Explained: MongoDB MMAPv1

MongoDB concurrency - reduces the performance

I understand that mongo db does locking on read and write operations.
My Use case:
Only read operations. No write operations.
I have a collection about 10million documents. Storage engine is wiredTiger.
Mongo version is 3.4.
I made a request which should return 30k documents - took 650ms on an average.
When I made concurrent requests - same requests - 100 times - It takes in seconds - few seconds to 2 minutes all requests handled.
I have single node to serve the data.
How do I access the data:
Each document contains 25 to 40 fields. I indexed few fields. I query based on one index field.
API will return all the matching documents in json form.
Other informations: API is written using Spring boot.
Concurrency tested through JMeter shell script from command line on remote machine.
So,
My question:
Am I missing any optimizations? [storage engine level, version]
Can't I achieve all read requests to be served less than a second?
If so, what sla I can keep for this use case?
Any suggestions?
Edit:
I enabled database profiler in mongodb with level 2.
My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
These are the queries found through profiler.
Totally, it is taking less than 100ms. Is it true really?
My concurrent queries:
Now, When I hit 100 requests, nearly 150 operations are more than 100ms, 100 operations are more than 200ms, 90 operations are more than 300ms.
As per my single query analysis, 100 requests will be converted to 400 queries internally. It is fixed pattern which I verified by checking the query tag in the profiler output.
I hope this is what affects my request performance.
My single query internally converted to 4 queries:
Initial read
getMore
getMore
getMore
It's the way mongo cursors work. The documents are transferred from the db to the app in batches. IIRC the first batch is around 100 documents + cursor Id, then consecutive getMore calls retrieve next batches by cursor Id.
You can define batch size (number of documents in the batch) from the application. The batch cannot exceed 16MB, e.g. if you set batch size 30,000 it will fit into single batch only if document size is less than 500B.
Your investigation clearly show performance degradation under load. There are too many factors and I believe locking is not one of them. WiredTiger does exclusive locks on document level for regular write operations and you are doing only reads during your tests, aren't you? In any doubts you can compare results of db.serverStatus().locks before and after tests to see how many write locks were acquired. You can also run db.serverStatus().globalLock during the tests to check the queue. More details about locking and concurrency are there: https://docs.mongodb.com/manual/faq/concurrency/#for-wiredtiger
The bottleneck is likely somewhere else. There are few generic things to check:
Query optimisation. Ensure you use indexes. The profiler should have no "COLLSCAN" stage in execStats field.
System load. If your database shares system resources with application it may affect performance of the database. E.g. BSON to JSON conversion in your API is quite CPU hungry and may affect performance of the queries. Check system's LA with top or htop on *nix systems.
Mongodb resources. Use mongostat and mongotop if the server has enough RAM, IO, file descriptors, connections etc.
If you cannot spot anything obvious I'd recommend you to seek professional help. I find the simplest way to get one is by exporting data to Atlas, running your tests against the cluster. Then you can talk to the support team if they could advice any improvements to the queries.

Debugging flow for mongodb atlas alert

I'm fairly new to mongodb and atlas and am confused by the following alert
Query Targeting: Scanned Objects / Returned has gone above 1000
I expect there to be more data to aid in debugging such as the query or at least the collection. The query wasn't slow because the performance advisor didn't catch anything.
The only info given in the alert is
- time created
- the replica set
- a link to the shard
- the type of shard (primary/secondary)
how am I supposed to debug the supposed alerted issue?
A future alert had info on how to solve the issue - which in short is to download the mongodb logs and search for the inefficient query.
To download the logs
1. Navigate to the Cluster page
If you do not see the cluster you want on the page, ensure you have selected the > proper Project
2. Select the cluster
a. Click the ellipsis icon (...) next to the cluster containing the mongod instance whose logs you want to download.
b. Select Download Logs.
3. In the Download Logs modal, edit the following fields
Select process: Select the process for which you want logs. Valid options are mongod and mongod-audit-log.
Select server: Select the server in the cluster whose logs you want to retrieve.
Start Time: Specify the date and time in your group’s time zone defining the inclusive lower bound of log activity to return. The start time must be less than 30 days ago.
End Time: Specify the date and time in your group’s time zone defining the inclusive upper bound of log activity to return.
4. Click Download Logs
An inefficient query is explained here
The following mongod log entry shows statistics generated from an inefficient query:
planSummary: COLLSCAN keysExamined:0
docsExamined: 10000 cursorExhausted:1 numYields:234
nreturned:4 protocol:op_query 358ms<Timestamp> COMMAND <query>
This query scanned 10,000 documents and returned only 4 for a ratio of 2500, which is highly inefficient.

For the Mongo database profiler, is there any difference between levels 0 and 1?

According to Mongo documentation I found here:
https://docs.mongodb.com/v3.2/tutorial/manage-the-database-profiler/#profiling-levels
a database profiler level of '0' means
the profiler is off, does not collect any data. mongod always writes operations longer than the slowOpThresholdMs threshold to its log. This is the default profiler level.
Meanwhile a level of '1' means
collects profiling data for slow operations only. By default slow operations are those slower than 100 milliseconds.
You can modify the threshold for “slow” operations with the slowOpThresholdMs runtime option or the setParameter command. See the Specify the Threshold for Slow Operations section for more information.
I do not see how these are different. They both only log slow operations, and both look to the same value (slowOpThresholdMs) to do so.
Am I missing something? If these are indeed the some, does someone know why the options are defined thusly? I got confused because other parts of the documentation (eg https://docs.mongodb.com/v3.2/reference/method/db.setProfilingLevel/) seem to indicate that level 0 means no profiling whatsoever, which is not what I observed.
There are two possible targets for profiler information:
The Mongod log
The system.profile collection in the database which is being profiled
The level controls what is written and where it is written to:
level 0 means that no output is written to the the system.profile collection but Mongo will output information about operations that take longer than the slowOpThresholdMs to the Mongo log
level 1 means that Mongo will write profiler documents to the system.profile collection for operations which take longer than slowOpThresholdMs and* Mongo will print information about operations that take longer than the slowOpThresholdMs to the Mongo log
So, the key difference is that profile documents will be written to the system.profile collection for level > 0.

My mongodb collection got dropped. How can I see what happened?

I have a collection that is alimented automatically to simulate user input. I used to have 32000+ documents in it, now only 2000. My collection was dropped some time yesterday and I don't know what happened.
Luckily, there is a timestamp on every document, so I can see exactly when it happened. (The oldest document in only a day old, there are new ones every 2 minutes.) How can I see what happened ?
It was likely done by a script, so it doesn't appear in .dbshell, which is capped to 99 lines anyway. Is there a history of anything done to the base, not just manually ?
Is there a way to know what caused this ?
Extra info: I'm the only one with access to the base for now, and I have only one script that can drop my tables, I haven't touched it in ages.
Check this Answer https://stackoverflow.com/a/15204638/4996928 on StackOverflow
I ended up solving this by starting mongod like this (hammered and ugly, yeah... but works for development environment):
mongod --profile=1 --slowms=1 &
This enables profiling and sets the threshold for "slow queries" as 1ms, causing all queries to be logged as "slow queries" to the file:
/var/log/mongodb/mongodb.log
Now I get continuous log outputs using the command:
tail -f /var/log/mongodb/mongodb.log
An example log:
Mon Mar 4 15:02:55 [conn1] query dendro.quads query: { graph: "u:http://example.org/people" } ntoreturn:0 ntoskip:0 nscanned:6 keyUpdates:0 locks(micros) r:73163 nreturned:6 reslen:9884 88ms