pg_dump blocked by another transaction using TRANSACTION_REPEATABLE_READ - postgresql

I have a java program exporting selected data from several tables of a Postgres database. The logic is
set the isolation level to TRANSACTION_REPEATABLE_READ
select data from table 1 and export them to an external file
select data from table 2 and export them
commit
While the program is running, if I run pg_dump to backup the database, pg_dump will be blocked until my program is finished.
I am using REPEATABLE_READ to make sure my export data is consistent, i.e. not affected by other concurrent transaction. Any suggestion how to get a consistent set of data without blocking pg_dump?
Thanks

Related

Is a PostgreSQL database accessible during restore?

Is it possible to access, particularly insert into, a database while it's being restored by pg_restore from a custom format.
If yes, then should I care about preventing clients from accessing the database while pg_restore is running, or the restore operation is "transactional" so that after it ends all changes made by clients since its start will be lost?
If you want to keep concurrent sessions out of your database during pg_restore, you'll have to block them with a pg_hba.conf entry.
There is no protection from concurrent sessions inserting or otherwise modifying data while pg_restore is running.

Copy data between two servers that are not connected

I have got 2 postgresql servers on 2 different computers that are not connected.
Each server holds a database with the same schema.
I would like one of the server to be the master server: this server should store all data that are inserted on both databases.
For that I would like to import regularly (on a daily basis for example) data from one database to the second database.
It implies that I should be able to :
"dump" into file(s) all data that have been stored in the first database since a given date.
import the exported data to the second database
I haven't seen any time/date option in pg_dump/pg_restore commands.
So how could I do that ?
NB: data are inserted in the database and never updated.
I haven't seen any time/date option in pg_dump/pg_restore commands.
There isn't any, and you can't do it that way. You'd have to dump and restore the whole database.
Alternatives are:
Use WAL based replication. Have the master write WAL to an archive location, using an archive_command. When you want to sync, copy all the new WAL from the master to the replica, which must be made as a pg_basebackup of the master and must have a suitable recovery.conf. The replica will replay the master's WAL to get all the master's recent changes.
Use a custom trigger-based system to record all changes to log tables. COPY those log tables to external files, then copy them to the replica. Use a custom script to apply the log table change records to the main tables.
Add a timestamp column to all your tables. Keep a record of when you last synced changes. Do a \COPY (SELECT * FROM sometable WHERE insert_timestamp > 'last_sync_timestamp') TO 'somefile' for each table, probably scripted. Copy the files to the secondary server. There, automate the process of doing a \copy sometable FROM 'somefile' to load the changes from the export files.
In your situation I'd probably do the WAL-based replication. It does mean that the secondary database must be absolutely read-only though.

Backup specific tables in AWS RDS Postgres Instance

I have two databases on Amazon RDS, both Postgres. Database 1 and 2
I need to restore an instance from a snapshot of Database 1 for my Staging environment. (Database 2 is my current Staging DB).
However, I want the data from a few of the tables in Database 2 to overwrite the tables in the newly restored snapshot. What is the best way to do this?
When restoring RDS from a Snapshot, a new database instance is created. If you only wish to copy a portion of the snapshot:
Restore the snapshot to a new (temporary) database
Connect to the new database and dump the desired tables using pg_dump
Connect to your staging server and restore the tables using pg_restore (most probably deleting any matching existing tables first)
Delete the temporary database
pg_dump actually outputs SQL commands that are then used to recreate tables and restore data. Look at the content of a dump to understand how the restore process actually works.
I hope this still works for someone else.
With my team we faced a similar issue. We also had 2 Postgres databases and we also just needed to backup some tables from db1 to db2.
What we did is to use a lambda function using Python (from AWS lambda ofc) that connected to both databases and validates if db1.table1 has the same data as db2.table1, if not, then the lambda function should write the missing data from db1.table1 into db2.table1. The approach of using lambda was because we wanted to automate the process due to the main db (let's say db1) is constantly being updated. In addition, it allowed us to only backup our desired tables (let's say 3 tables out of 10), instead of backing up the whole database.
Note: Maybe you want to do these writes using temporary tables to avoid issues with any constraints you have in your tables.

How to restore/rewind my PostgreSQL database

We do nightly full backups of our db and I then use that dump to create my own dev-db. The creation of the dev-db takes roughly 10 minutes so its scheduled every morning by cron before I get to work. So I can now work with an almost live db.
But when I'm testing things it would sometimes be convenient to rollback the full db or just some specific tables to the initial backup. Of course I could do the full recreation of the dev-db but that would make me wait for another 10 minutes before I could run the tests again.
So is there an easy way to restore/rewind the database/table to a specific point in time or from a dump?
I have tried to use pg_restore like this to restore specific tables:
pg_restore -d my-dev-db -n stuff -t tableA -t tableB latest-live-db.dump
I have tried with options like -cand --data-only also. But there seems to be several issues here that I did not foresee:
The old data is not automatically removed when the restored data is copied back.
There is several foreign-key constraints that makes this impossible (correct me if I'm wrong) without explicitly removing the FK before the restore and then adding them back again.
PK-sequences that gets out of order does not concern me at all at this point but that might be an issue as well.
Edit: more things I tested/looked into:
pg_basebackup
A more brute force alternative to pg_basebackup is to stop the db-server, copy the db-files, then start the db-server.
Both of the alternatives above fail because I have several local databases running in the same cluster and that sums up to a lot of data on disk. There is no way to separate the databases this way! So the file copy action here will not give me any speed gain.
I'm assuming you are asking about a database not a cluster. The first thing that comes to my mind is to restore the backup to 2 different dbs, one with the dev_db name and the other with another name like dev_db_back. Then when you need a fresh db drop dev_db and rename dev_db_backup to dev_db with
drop database if exists dev_db;
alter database dev_db_backup rename to dev_db;
After that, to have another source to rename from, restore the backup to dev_db_backup again. This could be done by a script so the dropping, renaming and restoring would be automated. As dropping and renaming are instantaneous just start the script and the renaming is done without a need to wait for the new restore.
If it is common to need repeated restores in less 10 minutes intervals I think you can try to do what you are doing inside a transaction:
begin;
-- alter the db
-- test the alterations
commit; -- or ...
-- rollback;

Postgres: how to start a procedure right after database start?

I have dozens of unlogged tables, and doc says that an unlogged table is automatically truncated after a crash or unclean shutdown.
Based on that, I need to check some tables after database starts to see if they are "empty" and do something about it.
So in short words, I need to execute a procedure, right after database is started.
How the best way to do it?
PS: I'm running Postgres 9.1 on Ubuntu 12.04 server.
There is no such feature available (at time of writing, latest version was PostgreSQL 9.2). Your only options are:
Start a script from the PostgreSQL init script that polls the database and when the DB is ready locks the tables and populates them;
Modify the startup script to use pg_ctl start -w and invoke your script as soon as pg_ctl returns; this has the same race condition but avoids the need to poll.
Teach your application to run a test whenever it opens a new pooled connection to detect this condition, lock the tables, and populate them; or
Don't use unlogged tables for this task if your application can't cope with them being empty when it opens a new connection
There's been discussion of connect-time hooks on pgsql-hackers but no viable implementation has been posted and merged.
It's possible you could do something like this with PostgreSQL bgworkers, but it'd be a LOT harder than simply polling the DB from a script.
Postgres now has pg_isready for determining if the database is ready.
https://www.postgresql.org/docs/11/app-pg-isready.html