Debugging flow for mongodb atlas alert - mongodb-atlas

I'm fairly new to mongodb and atlas and am confused by the following alert
Query Targeting: Scanned Objects / Returned has gone above 1000
I expect there to be more data to aid in debugging such as the query or at least the collection. The query wasn't slow because the performance advisor didn't catch anything.
The only info given in the alert is
- time created
- the replica set
- a link to the shard
- the type of shard (primary/secondary)
how am I supposed to debug the supposed alerted issue?

A future alert had info on how to solve the issue - which in short is to download the mongodb logs and search for the inefficient query.
To download the logs
1. Navigate to the Cluster page
If you do not see the cluster you want on the page, ensure you have selected the > proper Project
2. Select the cluster
a. Click the ellipsis icon (...) next to the cluster containing the mongod instance whose logs you want to download.
b. Select Download Logs.
3. In the Download Logs modal, edit the following fields
Select process: Select the process for which you want logs. Valid options are mongod and mongod-audit-log.
Select server: Select the server in the cluster whose logs you want to retrieve.
Start Time: Specify the date and time in your group’s time zone defining the inclusive lower bound of log activity to return. The start time must be less than 30 days ago.
End Time: Specify the date and time in your group’s time zone defining the inclusive upper bound of log activity to return.
4. Click Download Logs
An inefficient query is explained here
The following mongod log entry shows statistics generated from an inefficient query:
planSummary: COLLSCAN keysExamined:0
docsExamined: 10000 cursorExhausted:1 numYields:234
nreturned:4 protocol:op_query 358ms<Timestamp> COMMAND <query>
This query scanned 10,000 documents and returned only 4 for a ratio of 2500, which is highly inefficient.

Related

I need help for MongoDB Atlas Query Targeting: Scanned Objects / Returned has gone above 1000 alert

I’m using MongoDB Atlas’ paid tier Replicaset (Primary-Secondary-Secondary)
Every hour I create a new collection and insert about 1.5 to 2 million documents.
When I check Atlas’ cluster metrics every time I insert it, the primary is unchanged, and the query targeting of secondary is rapidly increasing.
As a result, Interferes with alerts of actual dangerous operations according to collscan and it is very noisy because the alarm of atlas occurs every hour
The alarm is using readPreference=secondary in my application, so it is difficult to disable.
I need an opinion on how this can happen.
Below is the metric information atlas metrics page that I checked.
enter image description here

How to disable MongoDB aggregation timeout

I want to run aggregation on my large data sets. (It's about 361K documents) and Insert them to another collection.
I getting this error:
I tried to increase Max Time but it has maximum and it's not enough for my data sets. I found https://docs.mongodb.com/manual/reference/method/cursor.noCursorTimeout/ but it seems noCursorTimeout only apply on find not aggregation.
please tell me how I can disable cursor timeout or another solution to do this.
I am no MongoDB expert but will interpret what I know.
MongoDB Aggregation Cursors don't have a mechanism to adjust Batch Size or set Cursor Timeouts.
Therefore there is no direct way to alter this and the timeout of an aggregation query solely depends on the cursorTimeoutMillis parameter of the MongoDB or mongos` instance. Its default timeout value is 10 minutes.
Your only option is to change this value by the below command.
use admin
db.runCommand({setParameter:1, cursorTimeoutMillis: 1800000})
However, I strongly advise you against using this command. That's because it's a safety mechanism built into MongoDB. It automatically deletes queries that are running idle for more than 10 minutes, so that there is a lesser load in the MongoDB server. If you change this parameter (say to 30 minutes), MongoDB will allow idle queries to be running in the background for those 30 minutes, which will not only make all the new queries slower to execute, but also increase load and memory on the MongoDB side.
You have a couple of workarounds. Reduct the amount of documents if working on MongoDB Compass or copy and run the commands on Mongo Shell (I had success so far with this method).

mongoDB find a query that throws alert

So I am getting an alert from Mongo - occasionally...
Query Targeting: Scanned Objects / Returned has gone above 1000
Is there a way to see the offending query specifically? I see graphs of trends over time in my dashboard - my "performance advisor" shows no "slow" queries...and the emails alert I get specifically says to check "performance advisor".
Any help
Normally, when the Scanned Object / Returned ratio is large, that means that those queries will be slow and will show up in the slow_query log. If nothing is showing up there, you can reduce the slowms setting that determines which queries will be written to the slow query log.
$explain and the $collStats aggregation operator are two other tools that are worth being aware of, but for this case I'd recommend updating your profiling level (db.setProfilingLevel) and then seeing where you're at!
If you're using Atlas, the "Profiler" tab shows the queries from the slow query log in an explorable way. If you're not on atlas, mtools has some good mongo log parsing tools.
Just recently we have been seeing same error in alerts for our mongo cluster
Query Targeting: Scanned Objects / Returned has gone above 1000
I was not able to find anything that returns this sort of huge chunk in code\logs.
It turned out to be related to our Charts page, ie it is firing when we open charts.
I suppose because of the lot's of aggregations it does, and I am not sure how to optimise those, as it's mainly UI set up
So if you have any charts on mongodb, it worth check if opening those triggers the alert.

What is "locks(micros) w:16035" and "locks(micros) r:10051" in the mongodb log

I have enabled profiling in the MongoDb config file.
profile=2
slowms=5
The mongodb log contains all the queries that took longer than 5 milliseconds (weird, I thought profile=2 meant log ALL queries).
For all update entries, the line ends with locks(micros) w:17738 17ms (the actual number varies). For all the query entries, the line contains locks(micros) r:15208
Sample line
Tue Dec 03 02:29:43.084 [conn11] update DbName.CollectionName query: { _id: ObjectId('51dfd2791bbdbe0b44395553')} update: { json for new document } nscanned:1 nmoved:1 nupdated:1 keyUpdates:0 locks(micros) w:17738 17ms
Reading the docs, I found the following section,
system.profile.lockStats
New in version 2.2.
The time in microseconds the operation spent acquiring and holding locks. This field reports data for the following lock types:
R - global read lock
W - global write lock
r - database-specific read lock
w - database-specific write lock
Okay, so the r & w are some database-specific lock times. But which one? Is it time spent holding the lock or time spent waiting to acquire a lock?
profile=2
slowms=5
The mongodb log contains all the queries that took longer than 5 milliseconds (weird, I thought profile=2 meant log ALL queries).
Setting profile to level 2 means that all queries are included in the system.profile capped collection irrespective of the slowms value. This does not affect what queries will be included in the mongod log.
Setting slowms to 5ms defines the threshold for slow queries that will be logged (irrespective of profiling) and included in the system.profile collection if profile is level 1 (i.e. profile slow queries).
If you want to see queries in your logs as well, you can increase the loglevel to 1 or higher:
db.adminCommand( { setParameter: 1, logLevel: 1 } )
WARNING: increased log levels get very noisy, and the logs are not capped like the system.profile collection!
Okay, so the r & w are some database-specific lock times. But which one? Is it time spent holding the lock or time spent waiting to acquire a lock?
For the system.profile.lockStats there are separate fields for acquiring (timeAcquiringMicros) and holding (timeLockedMicros) locks.
The "locks(micros)" details in the log are only showing the timeLockedMicros details (source reference: db/lockstat.cpp.
I was hoping for a link that mentions what the various fields of the log file are
I'm not aware of any detailed documentation for the log file format, and there are definitely some variations between major MongoDB releases. A great open source toolkit for working with MongoDB log files is mtools. You could peek into the code there to see how it parses different log lines.

Restrict querying MongoDB collection to inactive chunks only

I am building an application which will perform 2 phases.
Execute Phase - First phase is very
INSERT intensive (as many inserts
as the hardware can possibly can
execute in a second). This is
essentially a logging trail of work
performed.
Validation Phase - Next
phase will query the logs generated
by phase 1 and compare to an
external source and perform an
UPDATE on the record to store some
statistics. This process is second priority to phase 1.
I'm trying to see if its feasible to do them in parallel and keep write locking to a minimum for the execution phase. I thought one way to do this would be to restrict my Validation phase to only query from older records which are not in the chunk currently being inserted to by the execution phase. Is there something in MongoDB that restricts a find() to only query from chunks that have not been accessed in some configurable amount of time?
You probably want to set up replica set. Insert into the master and fetch from secondaries. In that way, your insert won't be blocked at all.
You can use the mentioned replica set with slaveOk, and update in the master.
You can use a timestamp field or an ObjectId (which already contains a timestamp) for filtering.