What is the easiest way to generate a script to drop and create all objects in a database? - postgresql

I'm used to working with SQL Server and the SQL Server Management Studio has the option to automatically generate a script to drop and recreate everything in a database (tables/views/procedures/etc). I find that when developing a new application and writing a bunch of junk in a local database for basic testing it's very helpful to have the options to just nuke the whole thing and recreate it in a clean slate, so I'm looking for a similar functionality within postgres/pgadmin.
PGAdmin has an option to generate a create script for a specific table but right clicking each table would be very tedious and I'm wondering if there's another way to do it.

To recreate a clean schema only database you can use the pg_dump client included with a Postgres server install. The options to use are:
-c
--clean
Output commands to clean (drop) database objects prior to outputting the commands for creating them. (Unless --if-exists is also specified, restore might generate some harmless error messages, if any objects were not present in the destination database.)
This option is ignored when emitting an archive (non-text) output file. For the archive formats, you can specify the option when you call pg_restore.
and:
-s
--schema-only
Dump only the object definitions (schema), not data.
This option is the inverse of --data-only. It is similar to, but for historical reasons not identical to, specifying --section=pre-data --section=post-data.
(Do not confuse this with the --schema option, which uses the word “schema” in a different meaning.)
To exclude table data for only a subset of tables in the database, see --exclude-table-data.

clean in Flyway
The database migration tool Flyway offers a clean command that drops all objects in the configured schemas.
To quote the documentation:
Clean is a great help in development and test. It will effectively give you a fresh start, by wiping your configured schemas completely clean. All objects (tables, views, procedures, …) will be dropped.
Needless to say: do not use against your production DB!

Related

What is the best way to backfill old database data to an existing Postgres database?

A new docker image was recently stood up to replace an existing postgres database. A dump was taken of the database before the old instance was shut down using the following command:
pg_dump -h localhost -p 5432 -d *dbname* -U postgres > *dbname*.pgdump
We'd like to concatenate or append this data to the new database in order to "backfill" some older historical data. The database name and schema of the two databases is identical. What is the easiest, safest way to do this? Secondly, need postgres be shut down during the process?
If overlapping primary keys or unique columns have been assigned to the new data, then there will be no clean way to merge them without putting in some work to clean that up. Assuming that hasn't happened...
The current dump file will have create statements for all the objects that already exists. If you replay that file into the current database, you will get a bunch of errors for all those objects. If you don't have it all run in one transaction, then you could simply ignore those errors. But, you might also load data in the wrong order and get foreign key violations. Those errors will be mixed in with all the other ones about existing object, so might be easy to overlook.
So what I would do is stand up an empty database server, and replay your current dump into that. Then retake the pg_dump, but with either -a or --section=data. Then you should be able to load that dump into your new database. This has two advantages, it will not dump out CREATE statements which are not needed and throw errors which would need to be ignored, and it should dump the tables in an order which will not cause foreign key violations.

Managing foreign keys when using pg_restore with multiple dumps

I have a bit of a weird issue. We were trying to create a database baseline for our local environment that has very specific data pre-seeded into it. Our hopes were to make sure that everyone was operating with the same data, making collaboration and reviewing code a bit simpler.
My idea for this was to run a command to dump the database whenever we run a migration or decide a new account is necessary for local dev. The issue with this is the database dump is around 17MB. I'm trying to avoid us having to add a 17MB file to GitHub every time we update the database.
So the best solution I could think of was to setup a script to dump each individual table in the database. This way, if a single table is updated, we'd only be pushing that backup to GitHub and it would be more along a ~200kb file as opposed to 17mb.
The main issue I'm running into with this is trying to restore the database. With a full dump, handling the foreign keys is relatively simple as it's all done in a single restore command. But with multiple restores, it gets a bit more complicated.
I'm looking to find a way to restore all tables to a database, ignoring triggers and constraints, and then enabling them again once the data has been populated. (or find a way to export the tables based on the order the foreign keys are defined). There are a lot of tables to work with, so doing this manually would be a bit of a task.
I'm also concerned about the relational integrity of the database if I disabled/re-enable constraints. Any help or advice would be appreciated.
Right now I'm running the following on every single table:
pg_dump postgres://user:password#pg:5432/database -t table_name -Fc -Z9 -f /data/www/database/data/table_name.bak
And then this command to restore all backups to the DB.
$data_command = "pg_restore --disable-triggers -d $dbUrl -Fc \"%s\"";
$backups = glob("$directory*.bak");
foreach($backups as $data_file){
if($data_file != 'data_roles.bak') {
exec(sprintf($data_command, $data_file));
}
}
This obviously doesn't work as I hit a ton of "Relationship doesn't exist" errors. I guess I'm just looking for a better way to accomplish this.
I would separate the table data and the database metadata.
Create a pre- and post-data scfipt with
pg_dump --section=pre-data -f pre.sql mydb
pg_dump --section=post-data -f post.sql mydb
Then dump just the data for each table:
pg_dump --section=data --table=tab1 -f tab1.sql mydb
To restore the database, first restore pre.sql, then all the table data, then post.sql.
The pre- and post-data will change often, but they are not large, so that shouldn't be a problem.

How to back up and restore an *entire* PostgreSQL intallation?

I would like to back up, and then restore, the contents of an entire postgres installtion, including all roles, all databases, everything.
During the restore, the target postgres installation shall be entirely replaced, so that its state is entirely as specified in the backup, no matter what's on the target postgres installation currently.
Existing answers include this and this, but none of them meets my requirements because
They use pg_dump, forcing me to list the databases manually. I don't want that. I want all databases.
They suggest pg_dumpall + psql, which doesn't work if the target installation already has (some part of) the tables; typical errors include ERROR: role "myuser" cannot be dropped because some objects depend on it, as a result the table is not dropped, and as a result the eventual COPY command importing data fails with e.g. ERROR: duplicate key value violates unique constraint.
They suggest copying on-disk data files, which doesn't work when you want to backup e.g. from version 9.5 and restore into version 9.6.
How can you backup, and restore, everything in a postgres installation, with one command each?

PG_restore. What does it do?

I read these docs:
Description
pg_restore is a utility for restoring a PostgreSQL database from an
archive created by pg_dump in one of the non-plain-text formats. It
will issue the commands necessary to reconstruct the database to the
state it was in at the time it was saved. The archive files also allow
pg_restore to be selective about what is restored, or even to reorder
the items prior to being restored. The archive files are designed to
be portable across architectures.
pg_restore can operate in two modes. If a database name is specified,
pg_restore connects to that database and restores archive contents
directly into the database. Otherwise, a script containing the SQL
commands necessary to rebuild the database is created and written to a
file or standard output. This script output is equivalent to the plain
text output format of pg_dump. Some of the options controlling the
output are therefore analogous to pg_dump options.
Obviously, pg_restore cannot restore information that is not present
in the archive file. For instance, if the archive was made using the
"dump data as INSERT commands" option, pg_restore will not be able to
load the data using COPY statements.
but it's still unclear to me if pg_restore just loads database data or if it also creates the structure of the database too.
It depends on the options you pass, and obviously in the stored information in the dump. If you keep reading through the documentation you will see this option:
--data-only
Restore only the data, not the schema (data definitions). Table data, large objects, and sequence values are restored, if
present in the archive.
This option is similar to, but for historical reasons not identical to, specifying --section=data.
That is obviously allowing you to restore only the schema but no the data.

Slow database restore from SQL file, considering binary replication to rollback tests database

When using Cucumber with Capybara, I have to load test database data from SQL data dump.
Unfortunately it takes 10s for each scenario, which slows down the tests.
I have found something like: http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#How_to_Replicate
Do you think binary replication will be quicker then using SQL files?
Is there anything I can do to make the restore quicker (I restore just data, not structure)?
What approaches would you recommend to try?
You could try to put your test data into a "template" database (e.g. mydb_template)
To prepare the test scenario, you simply drop your database using DROP DATABASE mydb and recreate based on the template: CREATE DATABASE mydb TEMPLATE = mydb_template;.
Of course you'll need to connect to e.g. template0 or the postgres database in order to be able to drop mydb.
I think this could be faster than importing a dump.
I recall a discussion on the PG mailing list regarding this approach and some performance problems with large "templates" that were fixed with 9.0.
(I restore just data, not structure)
COPY is always fastest for importing just data. Other answer deals with restoring a whole database.