execution of pg_dump on a standby server - postgresql

I am using physical replication with 1 standby in postgres v.14.4. The max_standby_archive_delay and max_standby_streaming_delay parameters are configured by default at 30s. if a pg_dump is executed on the standby side will the snapshot taken during data export be consistent despite small values for these parameters.

The pg_dump will be consistent, but obviously not at exactly the same point in time as one run simultaneously on your primary.
Note that longer-running transactions can be aborted on the standby if your replication configuration is set up to do so.
Clarification: Yes, of course the dump as a whole will be consistent - it would be useless otherwise. The program executes all its queries in a single snapshot. If you run it a second time, the second snapshot will be different from the first, but each dump will be internally consistent.

Related

postgres logical replication starting from given LSN

Postgres logical replication initial synchronization is very slow process, especially if original database is quite big.
I am wondering if it possible to start replication from given LSN?
The desired work flow will be
obtain current LSN from source database
create logical dump of desired objects in source database
restore dump on the target database
start logical replication from LSN acquired in step 1
I did not find any docs allowing step 4, does anybody know if it possible?
The documentation gives you a hint:
When a new replication slot is created using the streaming replication interface (see CREATE_REPLICATION_SLOT), a snapshot is exported (see Section 9.27.5), which will show exactly the state of the database after which all changes will be included in the change stream. This can be used to create a new replica by using SET TRANSACTION SNAPSHOT to read the state of the database at the moment the slot was created. This transaction can then be used to dump the database's state at that point in time, which afterwards can be updated using the slot's contents without losing any changes.
So the steps would be:
Start a replication connection to the database:
psql "dbname=yourdatabasename replication=database"
Create a replication slot and copy the snapshot name from the output. It is important to leave the connection open until the next step, otherwise the snapshot will cease to exist
CREATE_REPLICATION_SLOT slot_name LOGICAL pgoutput;
Dump the database at the snapshot with the following command. You can close the replication connection once that has started.
pg_dump --snapshot=snapshotname [...]
Restore the dump to the target database.
Start replication using the replication slot.

Postgres and multiple locations of data storage

Postgres and the default location for its storage is at my C-drive. I would like to restore a backup to another database but to access it via the same Postgres server instance - the issue is that the size of the DB is too big to be restore on the same c-drive ...would it be possible to tell Postgres that the second database should be restore and placed on another location/drive (while still remaining the first one)? Like database1 at my C-drive and database2 at my D-drive?
Otherwise the second best solution would be to install 2 separate Postgres instances - but that also seems a bit overkill?
That should be entirely achievable, if you've used the postgres pg_dump command.
The pg_dump command does not create the database, so you create it yourself first. Use CREATE TABLESPACE to specify the location.
CREATE TABLESPACE secondspace LOCATION 'D:\postgresdata';
CREATE DATABASE seconddb TABLESPACE secondspace;
This creates an empty database on the D: drive.
Then the standard restore from a pg_dump should work:
psql seconddb < dumpfile
Replication
Sounds like you need database replication.
There are several ways to do this with Postgres, one built-in, and other approaches using add-on libraries.
Built-in replication feature
The built-in replication feature is likely to suit your needs. See the manual. In this approach, you have an instance of Postgres running on your primary server, doing reads and writes of your data. On a second server, an entirely separate computer, you run another instance of Postgres known as the replica. You first set up the replica by doing a full backup of your database on the first server, and restore to the second server.
Next you configure the replication feature. The replica needs to know it is playing the role of a replica rather than a regular database server. And the primary server needs to know the replica exists, so that every database change, every insert, modification, and deletion, can be communicated.
WAL
This communication happens via WAL files.
The Write-Ahead Log (WAL) feature in Postgres is where the database writes all changes first to the WAL, and only after that is complete, then writes to the actual database. In case of crash, power outage, or other failure, the database upon restarting can detect a transaction left incomplete. If incomplete, the transaction is rolled back, and the database server can try again by seeing the "To-Do" list of work listed in the WAL.
Every so often the current WAL is closed, with a new WAL file created to take over the work. With replication enabled, the closed WAL file is copied to the replica. The replica then incorporates that WAL file, to follow the same "To-Do" list of changes as written in that WAL file. So all changes are made to the replica database exactly as they were made to the primary database. Your replica is an exact match to the primary, except for a slight lag in time. The replica is always just one WAL file behind the progress of the primary.
In times of trouble, the replica serves as a warm stand-by. You can shutdown the primary, then tell the replica that it is now the primary. You can even configure the replica to be a hot stand-by, meaning it will automatically take-over when the primary seems to have failed. There are pros and cons to hot stand-by.
Offload read-only queries
As a bonus feature, the replica can be used for read-only queries. If your database is heavily used, you can offload some of the work burden from your primary to the replica. Any queries that do not require the absolute latest information can be shifted by connecting to the replica rather than the original. For example, a quarterly sales report likely does not need the latest data stored in the active WAL file that has not yet arrived on the replica.
Physical replication means all databases are copied
Caveat: This built-in replication feature is physical replication. This means all the changes to the entire Postgres installation (formally known as a cluster, not to be confused with a hardware cluster) is copied to the replica. If you use one Postgres server to server multiple databases, all those databases must be replicated – you cannot pick and choose which get copied over. There may be alternative replication features in the future related to logical replication.
More to learn
I am being brief here. The topics of replication, high-availability, and disaster-recovery are broad and complex, too much for an Answer on Stack Overflow.
Tip: This kind of Question might have been better asked on the sister site, DBA.StackExchange.com.

Mongodump while writing

Is it safe to run mongodump against running server with many writes per second? Is it possible to get corrupted dump doing in this way?
From here:
Use --oplog to capture incoming write operations during the mongodump operation to ensure that the backups reflect a consistent data state.
Does it mean that no matter how many writes in database dump will be consistent?
If I ran mongodump --oplog at 1AM and it finished at 2AM then I run mongorestore --oplogReplay what state will I get?
From here:
However, the use of mongodump and mongorestore as a backup strategy can be problematic for sharded clusters and replica sets.
but why? I had replica set of 1 primary and 2 secondary. What the problem to run mongodump against one of secondary? It should same as primary (except replication lag difference).
The docs are quite clear about it:
--oplog
Creates a file named oplog.bson as part of the mongodump output. The oplog.bson file, located in the top level of the output directory, contains oplog entries that occur during the mongodump operation. This file provides an effective point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.
--oplog has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use --oplog to dump individual shards.
Without --oplog you still get a valid dump, just a bit inconsistent - some of the writes done between 1 AM and 2 AM will be missing.
With --oplog you have the oplog file captured at 2 AM. The dump remains inconsistent, and replaying the oplog on restore fixes this issue.
The problems dumping the sharded clusters deserve a dedicated page in the docs. Essentially because of complexity to synchronise backups of all nodes:
To create backups of a sharded cluster, you will stop the cluster balancer, take a backup of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup data. To capture a more exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment in time.
There are no problems to dump replica set.

Unable to elect primary after secondary taken offline Mongo

I have a replica setup with 1 Arbiter and 3 Mongo Databases.
2 of the databases (db1 and db2) I have given an equal priority of becoming primary, and the third one (db3) I have a priority of 0.
I am trying to take db3 offline to copy the data to another server, but every time I run db.shutdownServer() in db3, it causes db1 and db2 to become secondaries, and they remain stuck in this configuration.
It's my understanding that reelection only takes place when Primaries become unavailable.
Am I missing something??
So what was actually happening was I had added 3 other databases (shutdown) in hidden mode which were going to become my next replica set. Apparently Mongo has a setting where if the number of shutdown dbs > running dbs, the replica set goes into read-only mode, so obviously every time I would shutdown db3 this would trigger this to take place.

PostgreSQL - using log shipping to incrementally update a remote read-only slave

My company's website uses a PostgreSQL database. In our data center we have a master DB and a few read-only slave DB's, and we use Londiste for continuous replication between them.
I would like to setup another read-only slave DB for reporting purposes, and I'd like this slave to be in a remote location (outside the data center). This slave doesn't need to be 100% up-to-date. If it's up to 24 hours old, that's fine. Also, I'd like to minimize the load I'm putting on the master DB. Since our master DB is busy during the day and idle at night, I figure a good idea (if possible) is to get the reporting slave caught up once each night.
I'm thinking about using log shipping for this, as described on
http://www.postgresql.org/docs/8.4/static/continuous-archiving.html
My plan is:
Setup WAL archiving on the master DB
Produce a full DB snapshot and copy it to the remote location
Restore the DB and get it caught up
Go into steady state where:
DAYTIME -- the DB falls behind but people can query it
NIGHT -- I copy over the day's worth of WAL files and get the DB caught up
Note: the key here is that I only need to copy a full DB snapshot one time. Thereafter I should only have to copy a day's worth of WAL files in order to get the remote slave caught up again.
Since I haven't done log-shipping before I'd like some feedback / advice.
Will this work? Does PostgreSQL support this kind of repeated recovery?
Do you have other suggestions for how to set up a remote semi-fresh read-only slave?
thanks!
--S
Your plan should work.
As Charles says, warm standby is another possible solution. It's supported since 8.2 and has relatively low performance impact on the primary server.
Warm Standby is documented in the Manual: PostgreSQL 8.4 Warm Standby
The short procedure for configuring a
standby server is as follows. For full
details of each step, refer to
previous sections as noted.
Set up primary and standby systems as near identically as possible,
including two identical copies of
PostgreSQL at the same release level.
Set up continuous archiving from the primary to a WAL archive located
in a directory on the standby server.
Ensure that archive_mode,
archive_command and archive_timeout
are set appropriately on the primary
(see Section 24.3.1).
Make a base backup of the primary server (see Section 24.3.2), and load
this data onto the standby.
Begin recovery on the standby server from the local WAL archive,
using a recovery.conf that specifies a
restore_command that waits as
described previously (see Section
24.3.3).
To achieve only nightly syncs, your archive_command should exit with a non-zero exit status during daytime.
Additional Informations:
Postgres Wiki about Warm Standby
Blog Post Warm Standby Setup
9.0's built-in WAL streaming replication is designed to accomplish something that should meet your goals -- a warm or hot standby that can accept read-only queries. Have you considered using it, or are you stuck on 8.4 for now?
(Also, the upcoming 9.1 release is expected to include an updated/rewritten version of pg_basebackup, a tool for creating the initial backup point for a fresh slave.)
Update: PostgreSQL 9.1 will include the ability to pause and resume streaming replication with a simple function call on the slave.