How to view a log of all writes to MongoDB - mongodb

I'm able to see queries in MongoDB, but I've tried to see what writes are being performed on a MongoDB database without success.
My application code doesn't have any write commands in it. Yet, when I load test my app, I'm seeing a whole bunch of writes in mongostat. I'm not sure where they're coming from.
Aside from logging writes (which I'm unable to do), are there any other methods that I can use to determine where those writes are coming from?

You have a few options that I'm aware of:
a) If you suspect that the writes are going to a particular database, you can set the profiling level to 2 to log all queries
use [database name]
db.setProfilingLevel(2)
...
// disable when done
db.setProfilingLevel(0)
b) You can start the database with various levels of versbosity using -v
-v [ --verbose ] be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
c) You can use mongosniff to sniff the port
d) If you're using replication, you could also check the local.oplog.rs collection

I've tried all of jeffl's suggestions, and one of them was able to show me the writes: mongosniff. Thanks jeffl!
Here are the commands that I used to install mongosniff on my Ubuntu 10 box, in case someone else finds this useful:
git clone git://github.com/mongodb/mongo.git
cd mongo
git checkout r2.4.6
apt-get install scons libpcap-dev g++
scons mongosniff
build/linux2/normal/mongo/mongosniff --source NET lo 27017

I made a command line tool to see the logs, and also activate the profiler activity first without the need of others client tools: "mongotail".
To activate the log profiling to level 2:
mongotail databasename -l 2
Then to show the latest 10 queries:
mongotail databasename
Also you can use the tool with the -f option, to see the changes in "real time".
mongotail databasename -f
And finally, filter the result with egrep to find a particular operation, like only show writes operations:
mongotail databasename -f | egrep "(INSERT|UPDATE|REMOVE)"
See documentation and installation instructions in: https://github.com/mrsarm/mongotail

Related

running ycsb workloads on mongodb

I am running the YCSB tool on mongodb for benchmarking db and I notice that once I load a workload ( workloada for example) and run a transaction ( target 1500 for example) I am not able to run another transaction without dropping the entire database and loading the database again. the reason being that if I run another transaction without dropping and loading the database I get the error the "duplicate key error".
It looks like the first transaction entered some keys which the second transaction also tries to insert. Is there a workaround for this? Or is there something wrong with what I am doing.
this is the command I use for loading :
./bin/ycsb load mongodb -P workloads/workloada
-p mongodb.url=<ip_address>:27020
-p mongodb.maxconnections=150 -s
-p mongodb.writeConcern=normal
-target 3500 -threads 200 > <output-file>
Here is command I use for the transaction phase
./bin/ycsb load mongodb
-P workloads/workloada
-p mongodb.url=<IP_address>:27020
-p mongodb.maxconnections=100 -s
-p mongodb.writeConcern=normal
-target 1500 -threads 100 > <output_file>
When you load once you can run YCSB as many times as you want.
But loading again will give you the error as records are already loaded. Hence, you will have to delete the directory on which you are loading MongoDB.
The question is old but adding a response anyways.
That's how it is meant to behave: the load phase ideally runs once inserting data into MongoDB, followed by whatever workloads you intend to run.
Look here in YCSB wiki for an example of a sequence in which workloads can be run. This page in the wiki runs through the list of everything needed to run a test.
If load is what you intend to benchmark then you should drop the collection and the database between and before your 'load' operations.

How to get a consistent MongoDB backup for a single node setup

I'm using MongoDB in a pretty simple setup and need a consistent backup strategy. I found out the hard way that wrapping a mongodump in a lock/unlock is a bad idea. Then I read that the --oplog option should be able to provide consistency without lock/unlock. However, when I tried that, it said that I could only use the --oplog option on a "full dump." I've poked around the docs and lots of articles but it still seems unclear on how to dump a mongo database from a single point in time.
For now I'm just going with a normal dump but I'm assuming that if there are writes during the dump it would make the backup not from a single point in time, correct?
mongodump -h $MONGO_HOST:$MONGO_PORT -d $MONGO_DATABASE -o ./${EXPORT_FILE} -u backup -p password --authenticationDatabase admin
In production environment, MongoDB is typically deployed as replica set(s) to ensure redundancy and high availability. There are a few options available for point in time backup if you are running a standalone mongod instance.
One option as you have mentioned is to do a mongodump with –oplog option. However, this option is only available if you are running a replica set. You can convert a standalone mongod instance to a single node replica set easily without adding any new replica set members. Please check the following document for details.
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
This way, if there are writes while mongodump is running, they will be part of your backup. Please see Point in Time Operation Using Oplogs section from the following link.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-binary-database-dumps/#point-in-time-operation-using-oplogs
Be aware that using mongodump and mongorestore to back up and restore MongodDB can be slow.
File system snapshot is another option. Information from the following link details two snapshot options for performing hot backup of MongoDB.
http://docs.mongodb.org/manual/tutorial/backup-databases-with-filesystem-snapshots/
You can also look into MongoDB backup service.
http://www.10gen.com/products/mongodb-backup-service
In addition, mongodump with oplog options does not work with single db/collection at this moment. There are plans to implement the feature. You can follow the ticket and vote for the feature under the More Actions button.
https://jira.mongodb.org/browse/SERVER-4273

Moving MongoDB's data folder?

I have 2 computers in different places (so it's impossible to use the same wifi network).
One contains about 50GBs of data (MongoDB files) that I want to move to the second one which has much more computation power for analysis. But how can I make MongoDB on the second machine recognize that folder?
When you start mongodprocess you provide an argument to it --dbpath /directory which is how it knows where the data folder is.
All you need to do is:
stop the mongod process on the old computer. wait till it exits.
copy the entire /data/db directory to the new computer
start mongod process on the new computer giving it --dbpath /newdirectory argument.
The mongod on the new machine will use the folder you indicate with --dbpath. There is no need to "recognize" as there is nothing machine specific in that folder, it's just data.
I did this myself recently, and I wanted to provide some extra considerations to be aware of, in case readers (like me) run into issues.
The following information is specific to *nix systems, but it may be applicable with very heavy modification to Windows.
If the source data is in a mongo server that you can still run (preferred)
Look into and make use of mongodump and mongorestore. That is probably safer, and it's the official way to migrate your database.
If you never made a dump and can't anymore
Yes, the data directory can be directly copied; however, you also need to make sure that the mongodb user has complete access to the directory after you copy it.
My steps are as follows. On the machine you want to transfer an old database to:
Edit /etc/mongod.conf and change the dbPath field to the desired location.
Use the following script as a reference, or tailor it and run it on your system, at your own risk.
I do not guarantee this works on every system --> please verify it manually.
I also cannot guarantee it works perfectly in every case.
WARNING: will delete everything in the target data directory you specify.
I can say, however, that it worked on my system, and that it passes shellcheck.
The important part is simply copying over the old database directory, and giving mongodb access to it through chown.
#!/bin/bash
TARGET_DATA_DIRECTORY=/path/to/target/data/directory # modify this
SOURCE_DATA_DIRECTORY=/path/to/old/data/directory # modify this too
echo shutting down mongod...
sudo systemctl stop mongod
if test "$TARGET_DATA_DIRECTORY"; then
echo removing existing data directory...
sudo rm -rf "$TARGET_DATA_DIRECTORY"
fi
echo copying backed up data directory...
sudo cp -r "$SOURCE_DATA_DIRECTORY" "$TARGET_DATA_DIRECTORY"
sudo chown -R mongodb "$TARGET_DATA_DIRECTORY"
echo starting mongod back up...
sudo systemctl start mongod
sudo systemctl status mongod # for verification
quite easy for windows, just move the data folder to the target location
run cmd
"C:\your\mongodb\bin-path\mongod.exe" --dbpath="c:\what\ever\path\data\db"
In case of Windows in case you need just to configure new path for data, all you need to create new folder, for example D:\dev\mongoDb-data, open C:\Program Files\MongoDB\Server\6.0\bin\mongod.cfg and change there path :
Then, restart your PC. Check folder - it should contains new files/folders with data.
Maybe what you didn't do was export or dump the database.
Databases aren't portable therefore must be exported or created as a dumpfile.
Here is another question where the answer is further explained

App to monitor PostgreSQL queries in real time?

I'd like to monitor the queries getting sent to my database from an application. To that end, I've found pg_stat_activity, but more often then not, the rows which are returned read " in transaction". I'm either doing something wrong, am not fast enough to see the queries come through, am confused, or all of the above!
Can someone recommend the most idiot-proof way to monitor queries running against PostgreSQL? I'd prefer some sort of easy-to-use UI based solution (example: SQL Server's "Profiler"), but I'm not too choosy.
PgAdmin offers a pretty easy-to-use tool called server monitor
(Tools ->ServerStatus)
With PostgreSQL 8.4 or higher you can use the contrib module pg_stat_statements to gather query execution statistics of the database server.
Run the SQL script of this contrib module pg_stat_statements.sql (on ubuntu it can be found in /usr/share/postgresql/<version>/contrib) in your database and add this sample configuration to your postgresql.conf (requires re-start):
custom_variable_classes = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = top # top,all,none
pg_stat_statements.save = off
To see what queries are executed in real time you might want to just configure the server log to show all queries or queries with a minimum execution time. To do so set the logging configuration parameters log_statement and log_min_duration_statement in your postgresql.conf accordingly.
pg_activity is what we use.
https://github.com/dalibo/pg_activity
It's a great tool with a top-like interface.
You can install and run it on Ubuntu 21.10 with:
sudo apt install pg-activity
pg_activity
If you are using Docker Compose, you can add this line to your docker-compose.yaml file:
command: ["postgres", "-c", "log_statement=all"]
now you can see postgres query logs in docker-compose logs with
docker-compose logs -f
or if you want to see only postgres logs
docker-compose logs -f [postgres-service-name]
https://stackoverflow.com/a/58806511/10053470
I haven't tried it myself unfortunately, but I think that pgFouine can show you some statistics.
Although, it seems it does not show you queries in real time, but rather generates a report of queries afterwards, perhaps it still satisfies your demand?
You can take a look at
http://pgfouine.projects.postgresql.org/

See and clear Postgres caches/buffers?

Sometimes I run a Postgres query and it takes 30 seconds. Then, I immediately run the same query and it takes 2 seconds. It appears that Postgres has some sort of caching. Can I somehow see what that cache is holding? Can I force all caches to be cleared for tuning purposes?
I'm basically looking for a Postgres version of the following SQL Server command:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
But I would also like to know how to see what is actually contained in that buffer.
You can see what's in the PostgreSQL buffer cache using the pg_buffercache module. I've done a presentation called "Inside the PostgreSQL Buffer Cache" that explains what you're seeing, and I show some more complicated queries to help interpret that information that go along with that.
It's also possible to look at the operating system cache too on some systems, see [pg_osmem.py] for one somewhat rough example.
There's no way to clear the caches easily. On Linux you can stop the database server and use the drop_caches facility to clear the OS cache; be sure to heed the warning there to run sync first.
I haven't seen any commands to flush the caches in PostgreSQL. What you see is likely just normal index and data caches being read from disk and held in memory. by both postgresql and the caches in the OS. To get rid of all that, the only way I know of:
What you should do is:
Shutdown the database server (pg_ctl, sudo service postgresql stop, sudo systemctl stop postgresql, etc.)
echo 3 > /proc/sys/vm/drop_caches
This will clear out the OS file/block caches - very important though I don't know how to do that on other OSs. (In case of permission denied, try sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" as in that question)
Start the database server (e.g. sudo service postgresql start, sudo systemctl start postgresql)
Greg Smith's answer about drop_caches was very helpful. I did find it necessary to stop and start the postgresql service, in addition to dropping the caches. Here's a shell script that does the trick. (My environment is Ubuntu 14.04 and PostgreSQL 9.3.)
#!/usr/bin/sudo bash
service postgresql stop
sync
echo 3 > /proc/sys/vm/drop_caches
service postgresql start
I tested with a query that took 19 seconds the first time, and less than 2 seconds on subsequent attempts. After running this script, the query once again took 19 seconds.
I use this command on my linux box:
sync; /etc/init.d/postgresql-9.0 stop; echo 1 > /proc/sys/vm/drop_caches; /etc/init.d/postgresql-9.0 start
It completely gets rid of the cache.
I had this error.
psql:/cygdrive/e/test_insertion.sql:9: ERROR: type of parameter 53
(t_stat_gardien) does not match that when preparing the plan
(t_stat_avant)
I was looking for flushing the current plan and a found this:
DISCARD PLANS
I had this between my inserts and it solves my problem.
Yes, it is possible to clear both the shared buffers postgres cache AND the OS cache. Solution bellow is for Windows... others have already given the linux solution.
As many people already said, to clear the shared buffers you can just restart Postgres (no need to restart the server). But just doing this won't clear the OS cache.
To clear the OS cache used by Postgres, after stopping the service, use the excelent RamMap (https://technet.microsoft.com/en-us/sysinternals/rammap), from the excelent Sysinternals Suite.
Once you execute RamMap, just click "Empty"->"Empty Standby List" in the main menu.
Restart Postgres and you'll see now your next query will be damm slow due to no cache at all.
You can also execute the RamMap without closing Postgres, and probably will have the "no cache" results you want, since as people already said, shared buffers usually gives little impact compared to the OS cache. But for a reliable test, I would rather stop postgres as all before clearing the OS cache to make sure.
Note: AFAIK, I don't recommend clearing the other things besides "Standby list" when using RamMap, because the other data is somehow being used, and you can potentially cause problems/loose data if you do that. Remember that you are clearing memory not only used by postgres files, but any other app and OS as well.
Regards, Thiago L.
Yes, postgresql certainly has caching. The size is controlled by the setting shared_buffers. Other than that, there is as the previous answer mentions, the OS file cache which is also used.
If you want to look at what's in the cache, there is a contrib module called pg_buffercache available (in contrib/ in the source tree, in the contrib RPM, or wherever is appropriate for how you installed it). How to use it is listed in the standard PostgreSQL documentation.
There are no ways to clear out the buffer cache, other than to restart the server. You can drop the OS cache with the command mentioned in the other answer - provided your OS is Linux.
There is pg_buffercache module to look into shared_buffers cache. And at some point I needed to drop cache to make some performance tests on 'cold' cache so I wrote an pg_dropcache extension that does exactly this. Please check it out.
this is my shortcut
echo 1 > /proc/sys/vm/drop_caches; echo 2 > /proc/sys/vm/drop_caches; echo 3 > /proc/sys/vm/drop_caches; rcpostgresql stop; rcpostgresql start;
If you have a dedicated test database, you can set the parameter: shared buffers to 16. That should disable the cache for all queries.
The original heading was "See and Clear" buffers.
Postgres 13 with pg_buffercache extension provides a way to see doc page
On OSX there is a purge command for that:
sync && sudo purge
sync - force completion of pending disk writes (flush cache)
purge - force disk cache to be purged (flushed and emptied)
Credit goes to kenorb answering echo 3 > /proc/sys/vm/drop_caches on Mac OSX