I'm using a real Mongo instance for my integration tests.
On every test, I call mongorestore my_db_state --drop because I need a clean db to start the next test.
On my tiny db (with only one test user account) it takes 180 ms to run mongorestore my_db_state (without --drop) and 455 ms with --drop.
I don't understand why it's taking so long to drop the db, but how can I speed this up?
I may have a bunch of different connections on the mongo server because I never close them on my Flask server, but they probably should be closed after the test is over...just not sure if this is an issue to care about.
Is there a faster way to restore a snapshot to its original state for testing purposes?
Drop and recreate collections from your test runner using persistent MongoClient instances.
Related
I have a chron job that runs on a stateless server. On this chron job, I am trying to take a snapshot of my Postgres GCP Sql db (PRODUCTION_DATABASE), save it to S3 and then upload it to my staging, qa-1, dev databases. The problem is one table, call it LARGE_TABLE, needs to be shrunk because the table size is growing rapidly, thus causing problems and exceeding timeouts. Does anyone have any advice on how to get this done?
I tried running the cloud_sql_proxy to run pg_dump but no go with that method. Is there a way I can truncate one table and make a backup?
I have a basic Postgres instance in GCP with the following configuration:
https://i.stack.imgur.com/5FFZ6.png
I have a simple python script using pycopg2 that does inserts in a loop to the database connected through the sql auth proxy.
All the inserts are done in the same transaction.
The problem is that it is taking a couple of hours to insert around 200,000 records.
When I run the script on a local database it takes a couple of seconds.
What could be causing this huge difference?
I've installed new MongoDB server and now I want to import data from the old one. My MongoDB stores monitoring data and it's a bit problematic to export the data from old database (it's over 10Gb), so I though it might be possible to import directly from DB, but haven't found how to do that with mongoimport.
The export/import would be the fastest option.
But if you really want to bypass it you can use the new server as a replica of the old one, and wait for full replication.
It takes longer but it's an easy way to set up a full copy without impact on the first one.
Follow this:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
And then, once it's done, change configuration again.
It's easiest than it seems, but I recommend you to do a dry run with a sample database before doing it...
Note that another benefit is that the new replica will be probably smaller in size than the initial database, because MongoDb is not very good at freeing space of deleted members
mongoimport/mongoexport is per collection operating, so it's not proper for this kind of operation.
Instead to use mongodump/mongorestore.
If the old MongoDB instance can be shutdown to do this task, you can shut down it then copy all data files to the new server as its own data. And run the new instance.
Also db.cloneDatabase() can handle it to copy data directly from old instance to new one. It should be slower against copying data files directly.
You can use mongodump and pipe directly to the new database with mongorestore like:
mongodump --archive --db=test | mongorestore --archive --nsFrom='test.*' --nsTo='examples.*'
add --host --port and --username to mongorestore to connect to the remote db.
db.cloneDatabase() has been deprecated for a while.
You can use the copydb command discribed here.
Copies a database from a remote host to the current host or copies a database to another database within the current host.
copydb runs on the destination mongod instance, i.e. the host receiving the copied data.
I want to execute a long-running stored procedure on PostgreSQL 9.3. Our database server is (for the sake of this question) guaranteed to be running stable, but the machine calling the stored procedure can be shut down at any second (Heroku dynos get cycled every 24h).
Is there a way to run the stored procedure 'detached' on PostgreSQL? I do not care about its output. Can I run it asynchronously and then let the database server keep working on it while I close my database connection?
We're using Python and the psycopg2 driver, but I don't care so much about the implementation itself. If the possibility exists, I can figure out how to call it.
I found notes on the asynchronous support and the aiopg library and I'm wondering if there's something in those I could possibly use.
No, you can't run a function that keeps on running after the connection you started it from terminates. When the PostgreSQL server notices that the connection has dropped, it will terminate the function and roll back the open transaction.
With PostgreSQL 9.3 or 9.4 it'd be possible to write a simple background worker to run procedures for you via a queue table, but this requires the ability to compile and install new C extensions into the server - something you can't do on Heroku.
Try to reorganize your function into smaller units of work that can be completed individually. Huge, long-running functions are problematic for other reasons, and should be avoided even if unstable connections aren't a problem.
I have dozens of unlogged tables, and doc says that an unlogged table is automatically truncated after a crash or unclean shutdown.
Based on that, I need to check some tables after database starts to see if they are "empty" and do something about it.
So in short words, I need to execute a procedure, right after database is started.
How the best way to do it?
PS: I'm running Postgres 9.1 on Ubuntu 12.04 server.
There is no such feature available (at time of writing, latest version was PostgreSQL 9.2). Your only options are:
Start a script from the PostgreSQL init script that polls the database and when the DB is ready locks the tables and populates them;
Modify the startup script to use pg_ctl start -w and invoke your script as soon as pg_ctl returns; this has the same race condition but avoids the need to poll.
Teach your application to run a test whenever it opens a new pooled connection to detect this condition, lock the tables, and populate them; or
Don't use unlogged tables for this task if your application can't cope with them being empty when it opens a new connection
There's been discussion of connect-time hooks on pgsql-hackers but no viable implementation has been posted and merged.
It's possible you could do something like this with PostgreSQL bgworkers, but it'd be a LOT harder than simply polling the DB from a script.
Postgres now has pg_isready for determining if the database is ready.
https://www.postgresql.org/docs/11/app-pg-isready.html