Calculate WiredTiger cache miss from db.serverStatus output - mongodb

Been reading the following https://medium.com/dbkoda/the-notorious-database-cache-hit-ratio-c7d432381229 article which seems to calculate WiredTiger cache miss rate from data taken in db.serverStatus() output.
However, after performing the command (and also checking that the Java API doesn't have such method, don't really know how he is using the API?), just by checking what the method shows I can't really see the properties from the Document he is trying to retrieve, which are basically 'pages requested from the cache' and 'pages read into cache'.
The only metrics I can see related to that are a couple included within extra_fields, which are page_faults and page_reclaims, and if I'm correct those are both cache misses and cache hits respectively, right?
I'm trying to obtain cache performance (if it's hitting the cache or not after performing certain aggregations) when using certain queries.
Is there any way to obtain this metric straight away via MongoDB commands?

The code given is intended to be run in mongo shell.
The driver equivalent is the https://docs.mongodb.com/manual/reference/command/serverStatus/ command.
You would execute it using your driver's facility to run admin commands or arbitrary commands or database commands. For Ruby driver, this is https://docs.mongodb.com/ruby-driver/current/tutorials/ruby-driver-database-tasks/#arbitrary-comands.

Related

Is it possible to see the incoming queries in mongodb to debug/trace issues?

I have mongo running on my macbook (OSX).
Is it possible to run some kind of a 'monitor' that will display any income requests to my mongodb?
I need to trace if I have the correct query formatting from my application.
You will find these tools (or utilities) useful for monitoring as well as diagnosing purposes. All the tools except mtools are packaged with MongoDB server (sometimes they are installed separately).
1. Database Profiler
The profiler stores every CRUD operation coming into the database; it is off, by default. Having it on is quite expensive; it turns every read into a read+insert, and every write into a write+insert. CAUTION: Keeping it on can quickly overpower the server with incoming operations - saturating the IO.
But, it is a very useful tool when used for a short time to find what is going on with database operations. It is recommended to be used in development environments.
The profiler setting can be accessed by using the command db.getProfilingLevel(). To activate the profilre use the db.setProfilingLevel(level) command. Verify what is captured by the profiler in the db.system.profile collection; you can query it like any other collection using the find or aggregate methods. The db.system.profile document field op specifies the type of database operation; e.g., for queries it is "query".
The profiler has three levels:
0is not capturing any info (or is turned off and default). 1 captures every query that takes over 100ms. 2 captures every query;this can be used to find the actual load that is coming in.
2. mongoreplay
mongoreplay is a traffic capture and replay tool for MongoDB that you can use to inspect and record commands sent to a MongoDB instance, and then replay those commands back onto another host at a later time. NOTE: Available for Linux and macOS.
3. mongostat
mongostat commad-line utility provides a quick overview of the status of a currently running mongod instance.
You can view the incoming operations in real-time. The statistics are displated, by default every second. There are various options to customize the output, the time interval, etc.
4. mtools
mtools is a collection of helper scripts to parse, filter, and visualize (thru graphs) MongoDB log files.
You will find the mlogfilter script useful; it reduces the amount of information from MongoDB log files using various command options. For example, mlogfilter mongod.log --operation query filters the log by query operations only.

Is db.stats() a blocking call for MongoDB?

While researching how to check the size of a MongoDB, I found this comment:
Be warned that dbstats blocks your database while it runs, so it's not suitable in production. https://jira.mongodb.org/browse/SERVER-5714
Looking at the linked bug report (which is still open), it quotes the Mongo docs as saying:
Command takes some time to run, typically a few seconds unless the .ns file is very large (via use of --nssize). While running other operations may be blocked.
However, when I check the current Mongo docs, I don't find that text. Instead, they say:
The time required to run the command depends on the total size of the database. Because the command must touch all data files, the command may take several seconds to run.
For MongoDB instances using the WiredTiger storage engine, after an unclean shutdown, statistics on size and count may off by up to 1000 documents as reported by collStats, dbStats, count. To restore the correct statistics for the collection, run validate on the collection.
Does this mean the WiredTiger storage engine changed this to a non-blocking call by keeping ongoing stats?
a bit late to the game but I found this question while looking for the answer, and the answer is: Yes until 3.6.12 / 4.0.5 it was acquiring a "shared" lock ("R") which block all write requests during the execution. After that it's now an "intent shared" lock ("r") which doesn't block write requests. Read requests were not impacted.
Source: https://jira.mongodb.org/browse/SERVER-36437

What is default timeout for MongoDB operation (CRUD and aggregate)?

I didn't find information about what is default value for executing operation in MongoDB. Some of my aggregate commands takes minutes (very large reports). It is OK for me to waiting this time, but I'm afraid to get error.
I know, that I can set it. But a lots of my software users use their own servers. Of course with default settings.
Until this feature is implemented, this will essentially be a driver/client level setting. The query will run until completion on the server, though eventually it might timeout a cursor - see the cursorinfo command for more there.
To figure out what your settings are you will need to consult your relevant driver documentation. There may be multiple settings that apply based on what you are looking for, like the various options in the Java driver, for example.

Why does MongoDB *client* use more memory than the server in this case?

I'm evaluating MongoDB. I have a small 20GB subset of documents. Each is essentially a request log for a social game along with some captured state of the game the user was playing at that moment.
I thought I'd try finding game cheaters. So I wrote a function that runs server side. It calls find() on an indexed collection and sorts according to the existing index. Using a cursor it goes through all documents in indexed order. The index is {user_id,time}. So I'm going through each user's history, checking if certain values (money/health/etc) increase faster than is possible in the game. The script returns the first violation found. It does not collect violations.
The ONLY thing that this script does on the client is define the function and calls mymongodb.eval(myscript) on a mongod instance on another box.
The box that mongod is running on does fine. The one that the script is launched from starts losing memory and swap. Hours later: 8GB of RAM and 6GB of swap are being used on the client machine that did nothing more than launch a script on another box and wait for a return value.
Is the mongo client really that flakey? Have I done something wrong or made an incorrect assumption about mongo/mongod?
If you just want to open up a client connection to a remote database you should use the mongo command, not mongod. mongod starts up a server on your local machine. Not sure what specifying a url will do.
Try
mongo remotehost:27017
From the documentation:
Use map/reduce instead of db.eval() for long running jobs. db.eval blocks other operations!
eval is a function that blocks the entire server if you don't use a special flag. Again, from the docs:
If you don't use the "nolock" flag, db.eval() blocks the entire mongod process while running [...]
You are kind of abusing MongoDB here. Your current routine is strange, because it returns the first violation found, but it will have to re-check everything when run the next time (unless your user ids are ordered and you store the last evaluated user id).
Map/Reduce generally is the better option for a long-running task, but aggregating your data does not seem trivial. However, a map/reduce based solution would also solve the re-evaluation problem.
I'd probably return something like this from map/reduce:
user id -> suspicious actions, e.g.
------
2525454 -> [{logId: 235345435, t: ISODate("...")}]

How to track how long some Mongo queries take

I have a few Mongo queries in the JS format, such as:
db.hello.update(params,data);
How do I run them in such a way that I can see exactly how long they've taken to run later?
There are a couple of options:
Do your updates with safe=true, which will cause the update call to block until mongod has written the data (the exact syntax for this depends on the driver you're using). You can add timing code around your updates in your application code, and log as appropriate.
Enable verbose (or more-verbose) logging, and use the log files to determine the time spent during your updates. See the mongo docs on logging for more information.
Enable the profiler, which stores information about queries and updates in a capped collection, db.system.profile, including the time spent servicing the query or update. Note that enabling the profiler affects performance, though not severely. See the mongo docs on profiling for more information.