pg_restore --clean is not dropping and clearing the database - postgresql

I am having an issue with pg_restore --clean not clearing the database.
Or do I misunderstand what the --clean does, I am expecting it to truncate the database tables and reinitialize the indexes/primary keys.
I am using 9.5 on rds
This is the full command we use
pg_restore --verbose --clean --no-acl --no-owner -h localhost -U superuser -d mydatabase backup.dump
Basically what is happening is this.
I do a nightly backup of my production db, and restore it to an analytics db for the analyst to churn and run their reports.
I found out recently that the rails application used to view the reports was complaining that the primary keys were missing from the restored analytics database.
So I started investigating the production db, the analytics db etc. Which was when I realized that multiple rows with the same primary key existed in the analytics database.
I ran a few short experiments and realized that every time the pg_restore script is run it inserts duplicate data into the tables, this leads me to think that the --clean is not dropping and restoring the data. Because if I were to drop the schema beforehand, I don't get duplicate data.

To remove all tables from a database (but keep the database itself), you have two options.
Option 1: Drop the entire schema
You will need to re-create the schema and its permissions. This is usually good enough for development machines only.
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
Applications usually use the "public" schema. You may encounter other schema names when working with a (legacy) application's database.
Note that for Rails applications, dropping and recreating the database itself is usually fine in development. You can use bin/rake db:drop db:create for that.
Option 2: Drop each table individually
Prefer this for production or staging servers. Permissions may be managed by your operations team, and you do not want to be the one who messed up permissions on a shared database cluster.
The following SQL code will find all table names and execute a DROP TABLE statement for each.
DO $$ DECLARE
r RECORD;
BEGIN
FOR r IN (SELECT tablename FROM pg_tables WHERE schemaname = current_schema()) LOOP
EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(r.tablename) || ' CASCADE'; -- DROP TABLE IF EXISTS instead DROP TABLE - thanks for the clarification Yaroslav Schekin
END LOOP;
END $$;
Original:
https://makandracards.com/makandra/62111-how-to-drop-all-tables-in-postgresql

Related

pg_restore throws error when trying to restore a table with constraints

I try dumping tables from a production environment to a dev one. However, when dumping and restoring this table, using the following command:
pg_restore --no-owner --no-acl --clean --if-exists -d database dump_file.dump
I get an error stating that I can't drop that table unless I use something like CASCADE (i.e. dropping all other tables that depend on that one). Is there a way to determine the tables to be dropped? is there a way of maybe state in the pg_dump command to dump the table I'm looking to dump and all related tables ?
Here's the error raised:
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 4066; 2606 30526 CONSTRAINT table1 pkey user
pg_restore: error: could not execute query: ERROR: cannot drop constraint pkey on table public.table1 because other objects depend on it
DETAIL: constraint id_fkey on table public.dag depends on index public.pkey
constraint id_fkey on table public.dag depends on index public.pkey
HINT: Use DROP ... CASCADE to drop the dependent objects too...
You have a table on the dev database that has a pkey that is dependent and therefore can not be dropped before the restore. This is proper behavior.
I am not seeing dumping/restoring a particular table. You are dumping/restoring the entire database.
If you want recreate the production database as a dev one then do:
pg_restore -C --no-owner --no-acl --clean --if-exists -d postgres dump_file.dump
The -C with --clean will DROP DATABASE db_name and then rebuild it from scratch by connecting to the database postgres to do the DROP/CREATE db_name and then connect to db_name to load the rest of the objects.
This is the best way to clean out cruft and start at a consistent state.
UPDATE
Update your question with the pg_dump command so it is evident what you are doing.
If you want to see whether a particular table has dependencies, in the original database use psql and do \d the_table to see what the dependencies to and from the table are. If you tell pg_dump to dump a single table it will dump just that table. It will not follow dependencies and dump those also. That is up to you to do.
Look into using a schema management tool to do your changes/migrations. I use Sqitch for this.

PostgreSQL 13: create empty copy of database

I have a AWS RDS PostgreSQL 13 server with some databases. I have to create an empty copy of one database (empty means schema (tables, views, functions) + security (users, roles)).
Is pg_dump -s what I am looking for?
Thanks!
pg_dump -d db_name -s. You will also need to do pg_dumpall -g to get the global data e.g. roles. This will get all global data for the Postgres cluster, so you may have more then you need for the particular database.
Postgres allows the use of any existing database on the server as a template when creating a new database. I'm not sure whether pgAdmin gives you the option on the create database dialog but you should be able to execute the following in a query window if it doesn't:
CREATE DATABASE newdb WITH TEMPLATE originaldb OWNER dbuser;
Still, you may get:
ERROR: source database "originaldb" is being accessed by other users
To disconnect all other users from the database, you can use this query:
SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'originaldb' AND pid <> pg_backend_pid();

Postgres generate user grant statements for all objects

We have a dev Postgres DB that one of the developers has created an application in. Is there an existing query that will pull information from the role_table_grants table and generate all the correct statements to move into production? PGAdmin will create all the generate scripts for certain things but I haven't found a less manual way rather than just writing all the statements by hand based on the role_table_grants table. Not asking anyone to dump time into creating it, just thought I would ask if there are some existing migration scripts out there that would help.
Thanks.
Dump the schema to a file; use pg_dump or pg_dumpall with the --schema-only option.
Then use grep to get all the GRANT and REVOKE statements.
On my dev machine, I might do something like this.
$ pg_dump -h localhost -p 5435 -U postgres --schema-only sandbox > sandbox.sql
$ grep "^GRANT\|^REVOKE" sandbox.sql
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
[snip]
Perhaps pg_dumpall is what you need. Probably with --schema-only option in order to dump just schema, not development data.
If you need to move not all databases, you can use pg_dumpall --globals-only to dump roles (which don't belong to any particular database), and then use pg_dump to dump one certain databases.

pg_dump vs pg_dumpall? which one to use to database backups?

I tried pg_dump and then on a separate machine I tried to import the sql and populate the database, I see
CREATE TABLE
ERROR: role "prod" does not exist
CREATE TABLE
ERROR: role "prod" does not exist
CREATE TABLE
ERROR: role "prod" does not exist
CREATE TABLE
ERROR: role "prod" does not exist
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
WARNING: no privileges could be revoked for "public"
REVOKE
ERROR: role "postgres" does not exist
ERROR: role "postgres" does not exist
WARNING: no privileges were granted for "public"
GRANT
which means my user and roles and grant information is not in pg_dump
On the other hand we have pg_dumpall, I read conversation, and this does not lead me anywhere?
Question
- Which one should I be using for database backups? pg_dump or pg_dumpall?
- the requirement is that I can take the backup and should be able to import to any machine and it should work just fine.
The usual process is:
pg_dumpall --globals-only to get users/roles/etc
pg_dump -Fc for each database to get a nice compressed dump suitable for use with pg_restore.
Yes, this kind of sucks. I'd really like to teach pg_dump to embed pg_dumpall output into -Fc dumps, but right now unfortunately it doesn't know how so you have to do it yourself.
Up until PostgreSQL 11 there was also a nasty caveat with this approach: Neither pg_dump, nor pg_dumpall in --globals-only mode would dump user access GRANTs on DATABASEs. So you pretty much had to extract them from the catalogs or filter a pg_dumpall. This is fixed in PostgreSQL 11; see the release notes.
Make pg_dump dump the properties of a database, not just its contents (Haribabu Kommi)
Previously, attributes of the database itself, such as database-level GRANT/REVOKE permissions and ALTER DATABASE SET variable settings, were only dumped by pg_dumpall. Now pg_dump --create and pg_restore --create will restore these database properties in addition to the objects within the database. pg_dumpall -g now only dumps role- and tablespace-related attributes. pg_dumpall's complete output (without -g) is unchanged.
You should also know about physical backups - pg_basebackup, PgBarman and WAL archiving, PITR, etc. These offer much "finer grained" recovery, down to the minute or individual transaction. The downside is that they take up more space, are only restoreable to the same PostgreSQL version on the same platform, and back up all tables in all databases with no ability to exclude anything.

How to clone a test database from a production one in one single action?

I am looking for a basic script/command that will create a copy of a live database (let name them mydb and mydb_test, both on the same server).
Requirements
it has to run even if the mydb_test already exists and have records
it has to work even if mydb and mydb_test do have existing connections
it have to clean the potentially existing database if necessary
Hints:
drop database cannot be used if you have existing connections
The simplest and fastest method to create a complete copy of an existing (live) database is to use CREATE DATABASE with a TEMPLATE:
CREATE DATABASE mydb_test TEMPLATE mydb;
However, there is an important limitation violating your second requirement: the template (source) database cannot have additional connections to it. I quote the manual:
It is possible to create additional template databases, and indeed one
can copy any database in a cluster by specifying its name as the
template for CREATE DATABASE. It is important to understand, however,
that this is not (yet) intended as a general-purpose "COPY DATABASE"
facility. The principal limitation is that no other sessions can be
connected to the source database while it is being copied. CREATE DATABASE
will fail if any other connection exists when it starts; during
the copy operation, new connections to the source database are prevented.
You can terminate all sessions to the template database if you have the necessary privileges with pg_terminate_backend().
To temporarily disallow reconnects, revoke the CONNECT privilege (and GRANT back later).
REVOKE CONNECT ON DATABASE mydb FROM PUBLIC;
-- while connected to another DB - like the default maintenance DB "postgres"
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'mydb' -- name of prospective template db
AND pid <> pg_backend_pid(); -- don't kill your own session
CREATE DATABASE mydb_test TEMPLATE mydb;
GRANT CONNECT ON DATABASE mydb TO PUBLIC; -- only if they had it before
In versions before Postgres 9.2 use procpid instead of pid:
How to drop a PostgreSQL database if there are active connections to it?
Related:
Force drop db while others may be connected
If you cannot afford to terminate concurrent sessions, go with piping the output of pg_dump to psql like has been suggested by other answers already.
That's what I was looking for, but I had to compile it myself :P
I only wish I knew a way to keep the same user and not having to put it inside the script.
#!/bin/bash
DB_SRC=conf
DB_DST=conf_test
DB_OWNER=confuser
T="$(date +%s)"
psql -c "select pg_terminate_backend(procpid) from pg_stat_activity where datname='$DB_DST';" || { echo "disconnect users failed"; exit 1; }
psql -c "drop database if exists $DB_DST;" || { echo "drop failed"; exit 1; }
psql -c "create database $DB_DST owner confuser;" || { echo "create failed"; exit 1; }
pg_dump $DB_SRC|psql $DB_DST || { echo "dump/restore failed"; exit 1; }
T="$(($(date +%s)-T))"
echo "Time in seconds: ${T}"
Since you didn't say it was a problem to drop objects in the database, I think running pg_dump with the --clean option will do what you want. You can pipe the output of pg_dump into psql for this sort of thing.
Are you looking into the Hot Standby with Streaming Replication here?