PostgreSQL - using log shipping to incrementally update a remote read-only slave - postgresql

My company's website uses a PostgreSQL database. In our data center we have a master DB and a few read-only slave DB's, and we use Londiste for continuous replication between them.
I would like to setup another read-only slave DB for reporting purposes, and I'd like this slave to be in a remote location (outside the data center). This slave doesn't need to be 100% up-to-date. If it's up to 24 hours old, that's fine. Also, I'd like to minimize the load I'm putting on the master DB. Since our master DB is busy during the day and idle at night, I figure a good idea (if possible) is to get the reporting slave caught up once each night.
I'm thinking about using log shipping for this, as described on
http://www.postgresql.org/docs/8.4/static/continuous-archiving.html
My plan is:
Setup WAL archiving on the master DB
Produce a full DB snapshot and copy it to the remote location
Restore the DB and get it caught up
Go into steady state where:
DAYTIME -- the DB falls behind but people can query it
NIGHT -- I copy over the day's worth of WAL files and get the DB caught up
Note: the key here is that I only need to copy a full DB snapshot one time. Thereafter I should only have to copy a day's worth of WAL files in order to get the remote slave caught up again.
Since I haven't done log-shipping before I'd like some feedback / advice.
Will this work? Does PostgreSQL support this kind of repeated recovery?
Do you have other suggestions for how to set up a remote semi-fresh read-only slave?
thanks!
--S

Your plan should work.
As Charles says, warm standby is another possible solution. It's supported since 8.2 and has relatively low performance impact on the primary server.
Warm Standby is documented in the Manual: PostgreSQL 8.4 Warm Standby
The short procedure for configuring a
standby server is as follows. For full
details of each step, refer to
previous sections as noted.
Set up primary and standby systems as near identically as possible,
including two identical copies of
PostgreSQL at the same release level.
Set up continuous archiving from the primary to a WAL archive located
in a directory on the standby server.
Ensure that archive_mode,
archive_command and archive_timeout
are set appropriately on the primary
(see Section 24.3.1).
Make a base backup of the primary server (see Section 24.3.2), and load
this data onto the standby.
Begin recovery on the standby server from the local WAL archive,
using a recovery.conf that specifies a
restore_command that waits as
described previously (see Section
24.3.3).
To achieve only nightly syncs, your archive_command should exit with a non-zero exit status during daytime.
Additional Informations:
Postgres Wiki about Warm Standby
Blog Post Warm Standby Setup

9.0's built-in WAL streaming replication is designed to accomplish something that should meet your goals -- a warm or hot standby that can accept read-only queries. Have you considered using it, or are you stuck on 8.4 for now?
(Also, the upcoming 9.1 release is expected to include an updated/rewritten version of pg_basebackup, a tool for creating the initial backup point for a fresh slave.)
Update: PostgreSQL 9.1 will include the ability to pause and resume streaming replication with a simple function call on the slave.

Related

Postgres and multiple locations of data storage

Postgres and the default location for its storage is at my C-drive. I would like to restore a backup to another database but to access it via the same Postgres server instance - the issue is that the size of the DB is too big to be restore on the same c-drive ...would it be possible to tell Postgres that the second database should be restore and placed on another location/drive (while still remaining the first one)? Like database1 at my C-drive and database2 at my D-drive?
Otherwise the second best solution would be to install 2 separate Postgres instances - but that also seems a bit overkill?
That should be entirely achievable, if you've used the postgres pg_dump command.
The pg_dump command does not create the database, so you create it yourself first. Use CREATE TABLESPACE to specify the location.
CREATE TABLESPACE secondspace LOCATION 'D:\postgresdata';
CREATE DATABASE seconddb TABLESPACE secondspace;
This creates an empty database on the D: drive.
Then the standard restore from a pg_dump should work:
psql seconddb < dumpfile
Replication
Sounds like you need database replication.
There are several ways to do this with Postgres, one built-in, and other approaches using add-on libraries.
Built-in replication feature
The built-in replication feature is likely to suit your needs. See the manual. In this approach, you have an instance of Postgres running on your primary server, doing reads and writes of your data. On a second server, an entirely separate computer, you run another instance of Postgres known as the replica. You first set up the replica by doing a full backup of your database on the first server, and restore to the second server.
Next you configure the replication feature. The replica needs to know it is playing the role of a replica rather than a regular database server. And the primary server needs to know the replica exists, so that every database change, every insert, modification, and deletion, can be communicated.
WAL
This communication happens via WAL files.
The Write-Ahead Log (WAL) feature in Postgres is where the database writes all changes first to the WAL, and only after that is complete, then writes to the actual database. In case of crash, power outage, or other failure, the database upon restarting can detect a transaction left incomplete. If incomplete, the transaction is rolled back, and the database server can try again by seeing the "To-Do" list of work listed in the WAL.
Every so often the current WAL is closed, with a new WAL file created to take over the work. With replication enabled, the closed WAL file is copied to the replica. The replica then incorporates that WAL file, to follow the same "To-Do" list of changes as written in that WAL file. So all changes are made to the replica database exactly as they were made to the primary database. Your replica is an exact match to the primary, except for a slight lag in time. The replica is always just one WAL file behind the progress of the primary.
In times of trouble, the replica serves as a warm stand-by. You can shutdown the primary, then tell the replica that it is now the primary. You can even configure the replica to be a hot stand-by, meaning it will automatically take-over when the primary seems to have failed. There are pros and cons to hot stand-by.
Offload read-only queries
As a bonus feature, the replica can be used for read-only queries. If your database is heavily used, you can offload some of the work burden from your primary to the replica. Any queries that do not require the absolute latest information can be shifted by connecting to the replica rather than the original. For example, a quarterly sales report likely does not need the latest data stored in the active WAL file that has not yet arrived on the replica.
Physical replication means all databases are copied
Caveat: This built-in replication feature is physical replication. This means all the changes to the entire Postgres installation (formally known as a cluster, not to be confused with a hardware cluster) is copied to the replica. If you use one Postgres server to server multiple databases, all those databases must be replicated – you cannot pick and choose which get copied over. There may be alternative replication features in the future related to logical replication.
More to learn
I am being brief here. The topics of replication, high-availability, and disaster-recovery are broad and complex, too much for an Answer on Stack Overflow.
Tip: This kind of Question might have been better asked on the sister site, DBA.StackExchange.com.

How to exit out of database recovery mode (currently locked in read-only mode)

A slave database was set up some time ago for the purpose of backing up or replicating a remote database. However I can no longer write to the database using a Delphi based ETL (the ETL works for another database pair, but to date has never been used for this particular pair). The replication database was setup by somebody else who has since left the company. I am reasonably sure this has been setup as a replication database, however the employee who has since left told me that replication never worked for unrelated reasons. Using the ETL we can (using SQL queries) read from the one database, and write back to the replication database, Or should be able to, as it is currently read only.
I have tried:
Maintenance such as VACUUM
Attempt to drop tables and the entire database
Restore a full backup from the master database
None of these work, and I am told the database is read-only.
I have looked at postgresql.conf and see that hot_standby is checked, so I think (but am not 100% certain) that the database is in some sort of replication mode (I've never touched replication as supported by Postgres, so I wouldn't know).
I have checked permissions in pg_hba.conf and see there are some credentials in there for replication. I am not sure whether this activates "replication mode" for the database, or simply means these credentials are for replication only.
I have been through months worth of log files (This has not been working since our IT department upgraded the entire network about 5 months ago). I see the log file contents seen below, repeated over and over with nothing else for months. Note the IP address shown below is listed in the pg_hba.conf file, so credentials are valid.
The database is in recovery mode, as I have found by using:
select pg_is_in_recovery();
This explains to me why it's read only, but why can I not restore databases, or just simply dump the entire database and start again (it's a backup so losing/restoring it is not an issue)?
I was tempted to try modifying the recovery.conf file (which exists) but I read/believe that once recovery has been initiated (which in my case it has) modifying the file will have no effect.
I'm using a legacy version of Postgres: 9.2.9
Any help here would be greatly appreciated, as I have been working solidly on this for more than a day now.
Log File entry (sample):
FATAL: could not connect to the primary server:
FATAL: no pg_hba.conf entry for replication connection from host "192.168.20.2", user "postgres", SSL off
FATAL: could not connect to the primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally before or while processing the request.
A couple of options would work for me:
Convert the database from being a read-only replication database, to a standard read/write database or
Dump/drop the entire database so I can create a new one with write capabilities.
It looks like the two database clusters have been set up for replication, but configuration changes on one of the machines broke the replication (changed pg_hba.conf on the primary, changed IP addresses, …).
Here is the way to your desired solutions:
Bringing the standby out of recovery mode: Run
/path/to/pg_ctl promote -D /path/to/data/directory
on the standby as operating system user postgres.
Nuking the standby: Remove the data directory on the standby with rm -rf (or the equivalent on your operating system). Kill all PostgreSQL processes.
Then use initdb to create a new database cluster in the same location.

WAL level not sufficient for making an online backup

I have tried to do database replication in linux 7.0 red hat using postgresql.
I refered this URL for Confuring http://blog.terminal.com/postgresql-replication-and-backup-methods/ I completed the the steps upto this step
Configuring the slave server
but the step
Initial replication
when I executed this command in Master
-bash-4.2$ psql -c "select pg_start_backup('initial_backup');"
I got Error Like this
ERROR: WAL level not sufficient for making an online backup
HINT: wal_level must be set to "archive" or "hot_standby" at server start.
So kindly let me know where we are wrong.
On your master PostgreSQL server's configuration file <PG_DATA>/postgresql.conf there is parameter called wal_level it should be set to either "archive" or "hot_standby"
Both said to postgres to keep WAL segmetnts for replication server. The difference between the two is rather simple:
"hot_standby" means that your replication server will allow connections for SELECT statements
"archive" means you don't want that possibility.
More to read in this tutorial on streaming replication
Also keep in mind that change of wal_level property needs server restart to take an effect.

DB2 Online Restore but Without Roll Forward?

I read a lot of documentations for db2 restore but I could not find how to perform online restore from the last database backup but without roll forwarding of logs?
I will appreciate command example.
On example my last online backup is made 1st february. I want to do ONLINE RESTORE of that backup but without logs after 1st February (similar with offline restore option WITHOUT ROLL FORWARD).
I am using db2 9.7
Thank you in advance
The database backup contains a snapshot of the tablespaces, and they may not be in stable state. Roll-forward is always required (unless you want to take insane risks by forcing DB2 to start using potentially corrupt data) to reach the nearest stable state.
If you are asking your question because you want manageable database backup dumps without having to worry about shipping logs etc, use the INCLUDE LOGS option when taking the backup. It will include in the backup file the minimum set of transaction logs that would be required for reaching stable state. When restoring you could then use the LOGS to extract them and then ROLLFORWARD DATABASE for the required typical 0-x seconds (depending on your database transactions).
A lazy dba would probably just use the RECOVER DB SAMPLE TO 2013-02-01-00.00.00 and allow the DB2 to worry about all the details. It will automatically fetch the required database backup and transaction files (even from the backup tapes etc if you set them up correctly), and do everything for you - as long as you don't attempt to manually manage them.

Incremental backups from server to local machine

My live site is using mongodb to store user activities on the site.
I am having a single server running monogdb. I cant afford a second server for master slave replication.
my problem is i want to take the dump of server's mongodb database everyday and restore it to my local machine so that i can query on my local machine.I know how to dump and restore but the issue is every day i have to dump the entire database from server and restore it from the scratch in my local machine ..it takes a lot of time.
so my question is ..is there any way to have incremental backup in mongodb so that i have to dump and restore only single day data as it will take less time.
i do not know much about mongodb, but i have an idea.
i think you can introduce your local mongodb instance as a slave of master production db, and make slave only writable if possible, for preventing live system making selects from your local.
this way can work because slaves keeps track of master writes and deletes and try to make themselves as a copy of master.
And there is a good reason to do that is a slave doesn't have to be online always, when it becomes online, slave will check masters list (this list lenght like 1hour or 1 day is configurable at master) and copy datas from master as quick as possible.
Once you dump master to your local, then you can backup your data twice a day with this method i think.