Is it possible to access, particularly insert into, a database while it's being restored by pg_restore from a custom format.
If yes, then should I care about preventing clients from accessing the database while pg_restore is running, or the restore operation is "transactional" so that after it ends all changes made by clients since its start will be lost?
If you want to keep concurrent sessions out of your database during pg_restore, you'll have to block them with a pg_hba.conf entry.
There is no protection from concurrent sessions inserting or otherwise modifying data while pg_restore is running.
Related
Is there a way to backup all of the databases on a hyperscaler managed postgres server as of a certain time in order to maintain data consistency between the databases with either pg_dumpall, pg_dump or something else?
Background:
With the utilization of micro-services, an application may have many databases associated to it on a single hyperscaler managed postgres server. The hyperscalers do perform a functional snapshot backup; however, when a hyperscaler managed postgres server is accidentally deleted, the postgres backups are lost as well. These hyperscalers provide locks to prevent accidental deletes of a postgres server and mention that their support teams can be contacted to restore a deleted server, however, we still had a postgres server get deleted. We were able to recover by contacting the hyperscalers support team but would like to have a second way of backing up a hyperscaler managed postgres server.
I realize that the micro-services should be able to auto-recover to a data consistent point but the reality is that many of the micro-services have not been designed nor written to that requirement. I really do not want to get into the aspect of micro-service design and want to retain this to be a DBA backup question.
pg_dumpall will not take a consistent backup across all databases. Each database backup will be consistent, but the snapshots for the backups of the different databases will be taken at different times.
If you need a consistent backup across several databases in a single cluster, use an online file system backup with pg_basebackup.
You can use pg_basebackup which most of the PostgreSQL DBA’s would end up using on a daily basis be it via scripts or manually. This creates the base backup of the database which can help in recovering in multiple situations. This takes an online backup of the database and hence is very useful when being used in production
you can review this for more details
Postgres and the default location for its storage is at my C-drive. I would like to restore a backup to another database but to access it via the same Postgres server instance - the issue is that the size of the DB is too big to be restore on the same c-drive ...would it be possible to tell Postgres that the second database should be restore and placed on another location/drive (while still remaining the first one)? Like database1 at my C-drive and database2 at my D-drive?
Otherwise the second best solution would be to install 2 separate Postgres instances - but that also seems a bit overkill?
That should be entirely achievable, if you've used the postgres pg_dump command.
The pg_dump command does not create the database, so you create it yourself first. Use CREATE TABLESPACE to specify the location.
CREATE TABLESPACE secondspace LOCATION 'D:\postgresdata';
CREATE DATABASE seconddb TABLESPACE secondspace;
This creates an empty database on the D: drive.
Then the standard restore from a pg_dump should work:
psql seconddb < dumpfile
Replication
Sounds like you need database replication.
There are several ways to do this with Postgres, one built-in, and other approaches using add-on libraries.
Built-in replication feature
The built-in replication feature is likely to suit your needs. See the manual. In this approach, you have an instance of Postgres running on your primary server, doing reads and writes of your data. On a second server, an entirely separate computer, you run another instance of Postgres known as the replica. You first set up the replica by doing a full backup of your database on the first server, and restore to the second server.
Next you configure the replication feature. The replica needs to know it is playing the role of a replica rather than a regular database server. And the primary server needs to know the replica exists, so that every database change, every insert, modification, and deletion, can be communicated.
WAL
This communication happens via WAL files.
The Write-Ahead Log (WAL) feature in Postgres is where the database writes all changes first to the WAL, and only after that is complete, then writes to the actual database. In case of crash, power outage, or other failure, the database upon restarting can detect a transaction left incomplete. If incomplete, the transaction is rolled back, and the database server can try again by seeing the "To-Do" list of work listed in the WAL.
Every so often the current WAL is closed, with a new WAL file created to take over the work. With replication enabled, the closed WAL file is copied to the replica. The replica then incorporates that WAL file, to follow the same "To-Do" list of changes as written in that WAL file. So all changes are made to the replica database exactly as they were made to the primary database. Your replica is an exact match to the primary, except for a slight lag in time. The replica is always just one WAL file behind the progress of the primary.
In times of trouble, the replica serves as a warm stand-by. You can shutdown the primary, then tell the replica that it is now the primary. You can even configure the replica to be a hot stand-by, meaning it will automatically take-over when the primary seems to have failed. There are pros and cons to hot stand-by.
Offload read-only queries
As a bonus feature, the replica can be used for read-only queries. If your database is heavily used, you can offload some of the work burden from your primary to the replica. Any queries that do not require the absolute latest information can be shifted by connecting to the replica rather than the original. For example, a quarterly sales report likely does not need the latest data stored in the active WAL file that has not yet arrived on the replica.
Physical replication means all databases are copied
Caveat: This built-in replication feature is physical replication. This means all the changes to the entire Postgres installation (formally known as a cluster, not to be confused with a hardware cluster) is copied to the replica. If you use one Postgres server to server multiple databases, all those databases must be replicated – you cannot pick and choose which get copied over. There may be alternative replication features in the future related to logical replication.
More to learn
I am being brief here. The topics of replication, high-availability, and disaster-recovery are broad and complex, too much for an Answer on Stack Overflow.
Tip: This kind of Question might have been better asked on the sister site, DBA.StackExchange.com.
I have a Postgres database in production environment, and it has millions of records in tables. So I wanted to take a backup using pg_dump for some investigation.
But this database is so busy. So I am afraid if backup operation is caused any server issue like slow down server or crash database etc. as it is busy database.
Can anyone share if there is any risk? And please give some idea about best practice to take a backup from Postgres with no risk.
Running pg_dump will not cause a server crash, but it will add some extra CPU and particularly I/O load. You can test if that is a problem, pg_dump can be canceled any time.
On a busy database, it can also lead to table bloat, because old row versions have to be retained for the duration of pg_dump and cannot be vacuumed.
There are some alternatives:
Run pg_dump against a standby server.
Use pg_basebackup to perform a physical backup. That can be throttled to reduce the I/O load.
A slave database was set up some time ago for the purpose of backing up or replicating a remote database. However I can no longer write to the database using a Delphi based ETL (the ETL works for another database pair, but to date has never been used for this particular pair). The replication database was setup by somebody else who has since left the company. I am reasonably sure this has been setup as a replication database, however the employee who has since left told me that replication never worked for unrelated reasons. Using the ETL we can (using SQL queries) read from the one database, and write back to the replication database, Or should be able to, as it is currently read only.
I have tried:
Maintenance such as VACUUM
Attempt to drop tables and the entire database
Restore a full backup from the master database
None of these work, and I am told the database is read-only.
I have looked at postgresql.conf and see that hot_standby is checked, so I think (but am not 100% certain) that the database is in some sort of replication mode (I've never touched replication as supported by Postgres, so I wouldn't know).
I have checked permissions in pg_hba.conf and see there are some credentials in there for replication. I am not sure whether this activates "replication mode" for the database, or simply means these credentials are for replication only.
I have been through months worth of log files (This has not been working since our IT department upgraded the entire network about 5 months ago). I see the log file contents seen below, repeated over and over with nothing else for months. Note the IP address shown below is listed in the pg_hba.conf file, so credentials are valid.
The database is in recovery mode, as I have found by using:
select pg_is_in_recovery();
This explains to me why it's read only, but why can I not restore databases, or just simply dump the entire database and start again (it's a backup so losing/restoring it is not an issue)?
I was tempted to try modifying the recovery.conf file (which exists) but I read/believe that once recovery has been initiated (which in my case it has) modifying the file will have no effect.
I'm using a legacy version of Postgres: 9.2.9
Any help here would be greatly appreciated, as I have been working solidly on this for more than a day now.
Log File entry (sample):
FATAL: could not connect to the primary server:
FATAL: no pg_hba.conf entry for replication connection from host "192.168.20.2", user "postgres", SSL off
FATAL: could not connect to the primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally before or while processing the request.
A couple of options would work for me:
Convert the database from being a read-only replication database, to a standard read/write database or
Dump/drop the entire database so I can create a new one with write capabilities.
It looks like the two database clusters have been set up for replication, but configuration changes on one of the machines broke the replication (changed pg_hba.conf on the primary, changed IP addresses, …).
Here is the way to your desired solutions:
Bringing the standby out of recovery mode: Run
/path/to/pg_ctl promote -D /path/to/data/directory
on the standby as operating system user postgres.
Nuking the standby: Remove the data directory on the standby with rm -rf (or the equivalent on your operating system). Kill all PostgreSQL processes.
Then use initdb to create a new database cluster in the same location.
Assume there is an application in a non-stop loop trying to read from database.
I have tried the following but it does not work:
db2 CONNECT TO SAMPLE
db2 QUIESCE DATABASE IMMEDIATE FORCE CONNECTIONS
db2 TERMINATE
db2 DEACTIVATE DB SAMPLE
db2 BACKUP DATABASE SAMPLE
It seems as if (DEACTIVATE DB) does not do anything since an application in a loop can still read from the database.
I keep getting the error "The database is currently in use" when trying to backup.
You have to make sure there are not applications connected to the database (db2 list applications). Also, you have to make sure the database is not active (db2 list active databases).
Remember that a quiesce or a force applications, is a asynchronous task. It means that you execute any of them, but when the control is returned it does not mean the applications have bee disconnected.
A typical case is a rollback of a batch process, when the rollback takes several minutes.
QUIESCE DATABASE will not prevent new connections from coming in. I believe you have at least two choices:
Use QUIESCE INSTANCE <instance> USER <username> RESTRICTED ACCESS IMMEDIATE FORCE CONNECTIONS. This will force all existing connections and restricts access for new connections. Only the user specified in USER will be able to connect. Presumably, this will be your administrative account.
If this is a no-go, or if you are unable prevent USER from spawning new connections, you may want to (temporarily) UNCATALOG DB and/or disable the DB2COMM registry variable in order to prevent new connections.
HTH.