MongoDB verify if fsync is working - perl

I perform fsync() operation in MongoDB, but it just returns immediately. It doesn't appear to have done anything. How do I verify if the data has indeed been flushed to disk at all?
NOTE: I set syncdelay to 0 which means that the writes won't be automatically fsync'ed every 60 seconds.
my actual command is using the perl driver:
$connection->fsync({async=>1});
Thanks.

If you don't want the fsync to return immediately, then you can remove the async option and it will become a blocking operation.
But if you don't want it to be blocking, you can use db.currentOp from the shell to query the current state of the fsync.
If you want to get that information from Perl, you can use the technique I outlined in this answer. Unfortunately there's no convenient way to get it directly via run_command.

Related

Clearing cache in PostgreSQL

My question:
How to delete the cache of the database, so that the same query will always take the "real" time to run.
The context:
I'm trying to improve runtime for a query. The plan is to run the query once, than run explain on it and add some relevant indexes based on the explaination's output, and finally run the query again.
I was told that caching that occurs in the database might affect the results of my tests.
What is the simplest way to clear the cache, or to have a clean slate for tests in general?
Restarting the database will clear the database's shared_buffers cache. It will not clear the filesystem cache, which PostgreSQL relies upon heavily.
On Linux, writing 1 into the file /proc/sys/vm/drop_caches will drop the FS cache. (Do this after restarting the database) But you need to be a privileged user to do that. Other OS will have other methods.
It is dubious that this produces times that are more "real". They could easily be less "real". How often do you reboot your production server in reality? Usually better would be to write a driver script that runs the same query repeatedly but with different parameters so that it hits different parts of the data.
DISCARD releases internal resources associated with a database session. This command is useful for partially or fully resetting the session's state. There are several subcommands to release different types of resources; the DISCARD ALL variant subsumes all the others, and also resets additional state. Please try this
SET SESSION AUTHORIZATION DEFAULT;
RESET ALL;
DEALLOCATE ALL;
CLOSE ALL;
UNLISTEN *;
SELECT pg_advisory_unlock_all();
DISCARD PLANS;
DISCARD SEQUENCES;
DISCARD TEMP;
N.B.: DISCARD ALL cannot be executed inside a transaction block.
A metadata change should automatically invalidate the affected plans. So altering the table, creating/dropping indexes should do it and you don't need to do anything special.
The ANALYZE command also does it.

Is db.stats() a blocking call for MongoDB?

While researching how to check the size of a MongoDB, I found this comment:
Be warned that dbstats blocks your database while it runs, so it's not suitable in production. https://jira.mongodb.org/browse/SERVER-5714
Looking at the linked bug report (which is still open), it quotes the Mongo docs as saying:
Command takes some time to run, typically a few seconds unless the .ns file is very large (via use of --nssize). While running other operations may be blocked.
However, when I check the current Mongo docs, I don't find that text. Instead, they say:
The time required to run the command depends on the total size of the database. Because the command must touch all data files, the command may take several seconds to run.
For MongoDB instances using the WiredTiger storage engine, after an unclean shutdown, statistics on size and count may off by up to 1000 documents as reported by collStats, dbStats, count. To restore the correct statistics for the collection, run validate on the collection.
Does this mean the WiredTiger storage engine changed this to a non-blocking call by keeping ongoing stats?
a bit late to the game but I found this question while looking for the answer, and the answer is: Yes until 3.6.12 / 4.0.5 it was acquiring a "shared" lock ("R") which block all write requests during the execution. After that it's now an "intent shared" lock ("r") which doesn't block write requests. Read requests were not impacted.
Source: https://jira.mongodb.org/browse/SERVER-36437

Whether MongoDB has logs for each insertion and removal

I am wondering whether mongodb has logs for each insertion and removal? ie. monitoring or backup capabilities?
You can actually create an extremely verbose log of writes and reads and all that.
When you go to actually run mongod you can define a: http://docs.mongodb.org/manual/reference/mongod/#cmdoption-mongod--diaglog param which when set to 1 will log every single write operation including insertion and deletion.
Look at the oplog. It could be what you are looking for.
docs.mongodb.org
www.briancarpio.com
By default only queries which take longer that slowms are logged. You can log every query whith profiling level 2 however.
See here for more details

Why does MongoDB *client* use more memory than the server in this case?

I'm evaluating MongoDB. I have a small 20GB subset of documents. Each is essentially a request log for a social game along with some captured state of the game the user was playing at that moment.
I thought I'd try finding game cheaters. So I wrote a function that runs server side. It calls find() on an indexed collection and sorts according to the existing index. Using a cursor it goes through all documents in indexed order. The index is {user_id,time}. So I'm going through each user's history, checking if certain values (money/health/etc) increase faster than is possible in the game. The script returns the first violation found. It does not collect violations.
The ONLY thing that this script does on the client is define the function and calls mymongodb.eval(myscript) on a mongod instance on another box.
The box that mongod is running on does fine. The one that the script is launched from starts losing memory and swap. Hours later: 8GB of RAM and 6GB of swap are being used on the client machine that did nothing more than launch a script on another box and wait for a return value.
Is the mongo client really that flakey? Have I done something wrong or made an incorrect assumption about mongo/mongod?
If you just want to open up a client connection to a remote database you should use the mongo command, not mongod. mongod starts up a server on your local machine. Not sure what specifying a url will do.
Try
mongo remotehost:27017
From the documentation:
Use map/reduce instead of db.eval() for long running jobs. db.eval blocks other operations!
eval is a function that blocks the entire server if you don't use a special flag. Again, from the docs:
If you don't use the "nolock" flag, db.eval() blocks the entire mongod process while running [...]
You are kind of abusing MongoDB here. Your current routine is strange, because it returns the first violation found, but it will have to re-check everything when run the next time (unless your user ids are ordered and you store the last evaluated user id).
Map/Reduce generally is the better option for a long-running task, but aggregating your data does not seem trivial. However, a map/reduce based solution would also solve the re-evaluation problem.
I'd probably return something like this from map/reduce:
user id -> suspicious actions, e.g.
------
2525454 -> [{logId: 235345435, t: ISODate("...")}]

difference between db.runCommand({getlasterror:1,fsync:true}) and db.runCommand({getlasterror:1}) in MongoDB?

I understand that to getlasterror, it guarantees that the write has been done to a file.
This means that, even the computer power is off, the previous write is still ok.
But what is the use of fsync:true?
Essentially getLastError checking for an error in last database operation for the current connection. If you will run this command with fsync option it will also flush data to the datafiles (by defaul mongodb do it each 60 seconds).
More details you can find here and here