running ycsb workloads on mongodb

running ycsb workloads on mongodb - mongodb

I am running the YCSB tool on mongodb for benchmarking db and I notice that once I load a workload ( workloada for example) and run a transaction ( target 1500 for example) I am not able to run another transaction without dropping the entire database and loading the database again. the reason being that if I run another transaction without dropping and loading the database I get the error the "duplicate key error".
It looks like the first transaction entered some keys which the second transaction also tries to insert. Is there a workaround for this? Or is there something wrong with what I am doing.
this is the command I use for loading :
./bin/ycsb load mongodb -P workloads/workloada
-p mongodb.url=<ip_address>:27020
-p mongodb.maxconnections=150 -s
-p mongodb.writeConcern=normal
-target 3500 -threads 200 > <output-file>
Here is command I use for the transaction phase
./bin/ycsb load mongodb
-P workloads/workloada
-p mongodb.url=<IP_address>:27020
-p mongodb.maxconnections=100 -s
-p mongodb.writeConcern=normal
-target 1500 -threads 100 > <output_file>

When you load once you can run YCSB as many times as you want.
But loading again will give you the error as records are already loaded. Hence, you will have to delete the directory on which you are loading MongoDB.

The question is old but adding a response anyways.
That's how it is meant to behave: the load phase ideally runs once inserting data into MongoDB, followed by whatever workloads you intend to run.
Look here in YCSB wiki for an example of a sequence in which workloads can be run. This page in the wiki runs through the list of everything needed to run a test.
If load is what you intend to benchmark then you should drop the collection and the database between and before your 'load' operations.

Related

mongorestore - AtlasError - getting non-bucket system collections is unsupported

Failed: (AtlasError) getting non-bucket system collections is unsupported
I'm trying to use mongorestore to migrate data from one database to another. Both on Atlas. The dump is working fine, but mongorestore is outputting the below messages. I have no clue what this means and a google search renders nothing remotely close. I've added hyphens to the beginning of each line of the output to make it more readable.
-- using write concern: &{majority false 0}
-- will listen for SIGTERM, SIGINT, and SIGKILL
-- connected to node type: replset
-- The --db and --collection flags are deprecated for this use-case; please use --nsInclude instead, i.e. with --nsInclude=${DATABASE}.${COLLECTION}
-- got error from options parsing: (AtlasError) getting non-bucket system collections is unsupported
-- Failed: (AtlasError) getting non-bucket system collections is unsupported
-- 0 document(s) restored successfully. 0 document(s) failed to restore.
The command I'm running
mongorestore --uri="mongodb+srv://username:password#hostname.mongodb.net/$DEV_DATABASE" \
--preserveUUID \
--drop \
--nsFrom="$PROD_DATABASE.*" \
--nsTo="$DEV_DATABASE.*" \
--verbose \
"dump/$PROD_DATABASE"
I've also tried creating an archive file with mongodump and using that with --archive="filename", as well as piping stdout to mongorestore. I've also checked that the user I'm using has the correct privileges. They have the role of Atlas admin, which I'm assuming is correct. The dev cluster I'm trying to restore to is an M0 if that makes any difference.
I should also point out that I have minimal Mongo management experience, so I'm sure there's something I've overlooked. Thanks for your help.

MongoDB records the collection UUIDs in a separate system collection.
The --preserveUUID option instructs mongorestore to create the collection, and force it to use the UUID from the source system.
The error message indicates that Atlas is refusing to allow you to access or modify that system collection.
Run without the --preserveUUID option when restoring to Atlas.

Issues when upgrading and dockerising a Postgres v9.2 legacy database using pg_dumpall and pg_dump

I am using an official postgres v12 docker image that I want to initialise with two SQL dump files that are gathered from a remote legacy v9.2 postgres server during the docker build phase:
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dumpall -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --globals-only -l $REMOTE_DB_NAME" >> dump/a_globals.sql
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dump -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --create $REMOTE_DB_NAME" >> dump/b_db.sql
By placing both a_globals.sql and b_db.sql files into the docker image folder docker-entrypoint-initdb.d, then the database is initialised with the legacy SQL files when the v12 container starts (as described here). Docker is working correctly, the dump files are retrieved successfully. However I am running into problems initialising the container's database and require guidance:
When the container starts to initialise its DB, it stops with ERROR: role $someDBRole does not exist. This is because the psql v9.2 dump SQL files DROP roles before reinstating them; the container DB does not like this. Unfortunately it is not until psql v9.4 that pg_dumpall and pg_dump have the option to --if-exists (see pg_dumpall v9.2 documentation). What would you suggest that I do in order to remedy this? I could manually edit the SQL dump files, but this would be impractical as the snapshots of the legacy DB need to be automated. Is there a way to suppress this error during container startup?
If I want to convert from ASCII to UTF-8, is it adequate to simply set the encoding option for pg_dumpall and pg_dump? Or do I need to take into consideration other issues when upgrading?
Is there a way to supress the removal and adding of the postgres super user which is in the dump SQL?
In general are there any other gotchas when containerising and/or updating a postgres DB.

I'm not familiar with Docker so I don't know how straightforward it'll be do to these things, but in general, pg_dump/dumpall output, when it's in SQL format, will work just fine after having gone through some ugly string manipulation.
Pipe it through sed -e 's/DROP ROLE/DROP ROLE IF EXISTS/', ideally when writing the .sqls, but it's fine to just run sed -i -e <...> to munge the files in-place after they're created if you don't have a full shell available. Make it sed -r -e '/^DROP ROLE/DROP ROLE IF EXISTS/ if you're worried about strings containing DROP ROLE in your data, at the cost of portability (AFAIK -r is a GNU addition to sed).
Yes. It's worth checking the data in pg12 to make sure it got imported correctly, but in the general case, pg_dump has been aware of encoding considerations since time immemorial, and a dump->load is absolutely the best way to change your DB encoding.
Sure. Find the lines that do it in your .sql, copy enough of it to be unique, and pipe it through grep -v <what you copied> :D
I can't speak to the containerizing aspect of things, but - and this is more of a general practice, not even really PG-specific - if you're dealing with a large DB that's getting migrated, prepare a small one, as similar as possible to the real one but omitting any bulky data, to test with to get everything working so that doing the real migration is just a matter of changing some vars (I guess $REMOtE_HOST and $REMOTE_PORT in your case). If it's not large, then just be comfortable blowing away any pg12 containers that failed partway through the import, figure out & do whatever to fix the failure, and start from the top again until it works end-to-end.

Restoring PostgreSQL 11 backup to 12 hangs. How can I debug it?

I'm attempting to upgrade heroku PostgreSQL instances from pg11 to pg12 using the copy method as my testing environments are on hobby instances. At the end of the process it appears to be hanging for a long time (does not exit after >30 minutes for a 120MB database). The datastore view suggests everything is fine, I have the same number of rows, but there are issues.
It appears to be the fault of a materialized view. If I connect to the database and look through the tables and views, only one appears to be empty. Using postico, it waits and waits for the view's structure, but doesn't give the usual warning for an unpopulated view.
I can recreate the stalling behaviour by creating a local pg12 database and attempting to use pg_restore with a recent backup. Along the same lines, I appear to be able to get it working by creating an empty local database, running all the db migrations, truncating all tables and sequences, and then doing a --data-only --disable-triggers load from the same backup. Not a particularly smooth or inspiring migration plan plan. Using --verbose doesn't show up any obvious errors, the last thing I get is that it's creating the problematic materialized view.
I've also set log_statement to all, and the last one I get is that it's refreshing the problematic view. At this point, the postgres command starts using ~100% CPU.
Locally, I'm using this command to restore:
pg_restore --verbose --clean --no-acl --no-owner -h localhost -d database_name database_backup.dump
This is the command we use regularly to restore production backups for local development.
Are there any known gotchas with upgrading from 11 to 12, or ways that I might be able to extract more information about what's going on?

It has probably chosen an appalling plan for doing the materialized view query, due to lack of statistics at the time the refresh was launched.
You could kill the process, then restart the refresh once stats are gathered (which they might already be.)
If starting from scratch, you could run pg_restore with --section of pre-data and data, then do an ANALYZE, then do post-data.

Can I restore data from mongo oplog?

my mongodb was hacked today, all data was deleted, and hacker requires some amount to get it back, I will not pay him, cause I know he will not send me back my database.
But I have had oplog turn on, I see it contains over 300 000 documents, saving all operations.
Is there any tool that can restore my data from this logs?

Depending on how far back your oplog is, you may be able to restore the deployment. I would recommend taking a backup of the current state of your dbpath just in case.
Note that there are many variables in play for doing a restore like this, so success is never a guarantee. It can be done using mongodump and mongorestore, but only if your oplog goes back to the beginning of time (i.e. when the deployment was first created). If it does, you may be able to restore your data. If it does not, you'll see errors during the process.
Secure your deployment before doing anything else. This situation arises due to a lack of security. There are extensive security features available in MongoDB. Check out the Security Checklist page for details.
Dump the oplog collection using mongodump --host <old_host> --username <user> --password <pwd> -d local -c oplog.rs -o oplogDump.
Check the content of the oplog to determine the timestamp when the offending drop operation occur by using bsondump oplogDump/local/oplog.rs.bson. You're looking for a line that looks approximately like this:
{"ts":{"$timestamp":{"t":1502172266,"i":1}},"t":{"$numberLong":"1"},"h":{"$numberLong":"7041819298365940282"},"v":2,"op":"c","ns":"test.$cmd","o":{"dropDatabase":1}}
This line means that a dropDatabase() command was executed on the test database.
Keep note of the t value in {"$timestamp":{"t":1502172266,"i":1}}.
Restore to a secure new deployment using mongorestore --host <new_host> --username <user> --password <pwd> --oplogReplay --oplogLimit=1502172266 --oplogFile=oplogDump/local/oplog.rs.bson oplogDump
Note the parameter to oplogLimit, which is basically telling mongorestore to stop replaying the oplog once it hit that timestamp (which is the timestamp of the dropDatabase command in Step 3.
The oplogFile parameter is new to MongoDB 3.4. For older versions, you would need to copy the oplogDump/local/oplog.rs.bson to the root of the dump directory to a file named oplog.bson, e.g. oplogDump/oplog.bson and remove the oplogFile parameter from the example command above.
After Step 4, if your oplog goes back to the beginning of time and you stop the oplog replay at the right time, hopefully you should see your data at the point just before the dropDatabase command was executed.

How to view a log of all writes to MongoDB

I'm able to see queries in MongoDB, but I've tried to see what writes are being performed on a MongoDB database without success.
My application code doesn't have any write commands in it. Yet, when I load test my app, I'm seeing a whole bunch of writes in mongostat. I'm not sure where they're coming from.
Aside from logging writes (which I'm unable to do), are there any other methods that I can use to determine where those writes are coming from?

You have a few options that I'm aware of:
a) If you suspect that the writes are going to a particular database, you can set the profiling level to 2 to log all queries
use [database name]
db.setProfilingLevel(2)
...
// disable when done
db.setProfilingLevel(0)
b) You can start the database with various levels of versbosity using -v
-v [ --verbose ] be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
c) You can use mongosniff to sniff the port
d) If you're using replication, you could also check the local.oplog.rs collection

I've tried all of jeffl's suggestions, and one of them was able to show me the writes: mongosniff. Thanks jeffl!
Here are the commands that I used to install mongosniff on my Ubuntu 10 box, in case someone else finds this useful:
git clone git://github.com/mongodb/mongo.git
cd mongo
git checkout r2.4.6
apt-get install scons libpcap-dev g++
scons mongosniff
build/linux2/normal/mongo/mongosniff --source NET lo 27017

I made a command line tool to see the logs, and also activate the profiler activity first without the need of others client tools: "mongotail".
To activate the log profiling to level 2:
mongotail databasename -l 2
Then to show the latest 10 queries:
mongotail databasename
Also you can use the tool with the -f option, to see the changes in "real time".
mongotail databasename -f
And finally, filter the result with egrep to find a particular operation, like only show writes operations:
mongotail databasename -f | egrep "(INSERT|UPDATE|REMOVE)"
See documentation and installation instructions in: https://github.com/mrsarm/mongotail

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse