I have Postgresql 10 installed on my VM. I want to test as how recovery is done from WAL files.
I have created a scenario as below.
i have opened a transaction id for inserting records on a table like employee as
begin;
insert into employee values(select generate_series(1,200000),'abc',1);
here i didn't commit it as to recover it from WAL files to clear crash recovery process.
Please let me know that which steps needs to be taken so that i get the lost data which was in memory to recover it from WAL files. Keeping in mind that i have no streaming replication. It is a standalone single machine.
Thanks
Related
I try to configure backuping database in postgresql with pg_basebackup and WAL logs.
For now I created full backup once a week and want to backup wal logs too. But, as I understand, posgresql writes them all the time. So, how can I copy them and be shure that they are not corrupted?
Thanks
You set archive_command to a shell command that copies the WAL file to a safe archive location, so that burden is mostly on you.
When PostgreSQL runs archive_command, it assumes that the WAL file is not corrupted. Only a PostgreSQL bug or a bug in the storage system could cause a corrupted WAL segment.
There is no better protection against PostgreSQL bugs than always running the latest bugfix release, and you can invest in storage hardware that will at least detect failure.
You can also write your archive_command with a certain amount of paranoia, e.g. by comparing the md5sum of the WAL segment and its archive copy.
Another idea is to write two copies of the WAL file to different storage systems.
Current situation
So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.
The directory structure is like:
D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)
D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup
The question I have then is:
Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)
cheers!
WAL archives
You can use the pg_archivecleanup command to remove WAL from an archive (not pg_xlog) that's not required by a given base backup.
In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.
pg_xlog
Never remove WAL from pg_xlog manually. If you have too much WAL then:
your wal_keep_segments setting is keeping WAL around;
you have archive_mode on and archive_command set but it isn't working correctly (check the logs);
your checkpoint_segments is ridiculously high so you're just generating too much WAL; or
you have a replication slot (see the pg_replication_slots view) that's preventing the removal of WAL.
You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINT command.
If you have an offline server and need to remove WAL to start it you can use pg_archivecleanup if you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.
WAL files are incremental, so the simple answer is: You cannot throw any files out. The solution is to make a new base backup and then all previous WALs can be deleted.
The WAL files contain individual statements that modify tables so if you throw some older WALs out, then the recovery process will fail (it will not silently skip missing WAL files) because the state of the database cannot be restored reliably. You can move the WAL files to some other location without upsetting the WAL process but then you'd have to make all WAL files available again from a single location if you ever need to recover your database from some point in the past; if you are running out of disk space then that may mean recovering from some location where you have enough space to store the base backup and all WAL files. The main issue here is if you can do that fast enough to restore a full database after an incident.
Another issue is that if you cannot identify where/when a problem occurred that needs to be corrected your only option is to start with the base backup and then replay all the WAL files. This procedure is not difficult, but if you have an old base backup and many WAL files to process, this simply takes a lot of time.
The best approach for your case, in general, is to make a new base backup every x months and collect WALs with that base backup. After every new base backup you can delete the old base backup and its subsequent WALs or move them to cheap offline storage (DVD, tape, etc). In the case of a major incident you can quickly restore the database to a known correct state from the recent base backup and the relatively few WAL files collected since then.
A solution that we went for, is executing pg_basebackup every night. This would create a base backup and later on we can use pg_archivecleanup to clean up all the "old" WAL files before that base using something like
"%POSTGRES_INSTALLDIR%\bin\pg_archivecleanup" -d %WAL_backup_dir% %newestBaseFile%
Fortunately, we never had to recover yet, but it should work in theory.
In case someone found this by searching how to safely cleanup the WAL directory under a replication architecture, consider the scenario where there might be left overs from offline replicas, in this case, unused replica slots waiting for the replica to come back online and thus keeping a lot of WAL archives on the Master DB.
In our case we had an issue with a replica going down due to hardware failure, we had to recreate it along with its replica_slot on the Master DB but forgot to get rid of the previous used one. Once we cleared that out PSQL got rid of unused WALs and all was good.
You can add the script to automatically clean or remove pg_wal files. This will work in pg-11 version. If you want to use other psql version the you can simply replace the command "/usr/pgsql-11/bin/pg_archivecleanup" to /usr/pgsql-12/bin/pg_archivecleanup or 13 as per your wish.
#!/bin/bash
/usr/pgsql-11/bin/pg_controldata -D /var/lib/pgsql/11/data/ > pgwalfile.txt
/usr/pgsql-11/bin/pg_archivecleanup -d /var/lib/pgsql/11/data/pg_wal $(cat pgwalfile.txt | grep "Latest checkpoint's REDO WAL file" | awk '{print $6}')
I have got 2 postgresql servers on 2 different computers that are not connected.
Each server holds a database with the same schema.
I would like one of the server to be the master server: this server should store all data that are inserted on both databases.
For that I would like to import regularly (on a daily basis for example) data from one database to the second database.
It implies that I should be able to :
"dump" into file(s) all data that have been stored in the first database since a given date.
import the exported data to the second database
I haven't seen any time/date option in pg_dump/pg_restore commands.
So how could I do that ?
NB: data are inserted in the database and never updated.
I haven't seen any time/date option in pg_dump/pg_restore commands.
There isn't any, and you can't do it that way. You'd have to dump and restore the whole database.
Alternatives are:
Use WAL based replication. Have the master write WAL to an archive location, using an archive_command. When you want to sync, copy all the new WAL from the master to the replica, which must be made as a pg_basebackup of the master and must have a suitable recovery.conf. The replica will replay the master's WAL to get all the master's recent changes.
Use a custom trigger-based system to record all changes to log tables. COPY those log tables to external files, then copy them to the replica. Use a custom script to apply the log table change records to the main tables.
Add a timestamp column to all your tables. Keep a record of when you last synced changes. Do a \COPY (SELECT * FROM sometable WHERE insert_timestamp > 'last_sync_timestamp') TO 'somefile' for each table, probably scripted. Copy the files to the secondary server. There, automate the process of doing a \copy sometable FROM 'somefile' to load the changes from the export files.
In your situation I'd probably do the WAL-based replication. It does mean that the secondary database must be absolutely read-only though.
I read a lot of documentations for db2 restore but I could not find how to perform online restore from the last database backup but without roll forwarding of logs?
I will appreciate command example.
On example my last online backup is made 1st february. I want to do ONLINE RESTORE of that backup but without logs after 1st February (similar with offline restore option WITHOUT ROLL FORWARD).
I am using db2 9.7
Thank you in advance
The database backup contains a snapshot of the tablespaces, and they may not be in stable state. Roll-forward is always required (unless you want to take insane risks by forcing DB2 to start using potentially corrupt data) to reach the nearest stable state.
If you are asking your question because you want manageable database backup dumps without having to worry about shipping logs etc, use the INCLUDE LOGS option when taking the backup. It will include in the backup file the minimum set of transaction logs that would be required for reaching stable state. When restoring you could then use the LOGS to extract them and then ROLLFORWARD DATABASE for the required typical 0-x seconds (depending on your database transactions).
A lazy dba would probably just use the RECOVER DB SAMPLE TO 2013-02-01-00.00.00 and allow the DB2 to worry about all the details. It will automatically fetch the required database backup and transaction files (even from the backup tapes etc if you set them up correctly), and do everything for you - as long as you don't attempt to manually manage them.
My company's website uses a PostgreSQL database. In our data center we have a master DB and a few read-only slave DB's, and we use Londiste for continuous replication between them.
I would like to setup another read-only slave DB for reporting purposes, and I'd like this slave to be in a remote location (outside the data center). This slave doesn't need to be 100% up-to-date. If it's up to 24 hours old, that's fine. Also, I'd like to minimize the load I'm putting on the master DB. Since our master DB is busy during the day and idle at night, I figure a good idea (if possible) is to get the reporting slave caught up once each night.
I'm thinking about using log shipping for this, as described on
http://www.postgresql.org/docs/8.4/static/continuous-archiving.html
My plan is:
Setup WAL archiving on the master DB
Produce a full DB snapshot and copy it to the remote location
Restore the DB and get it caught up
Go into steady state where:
DAYTIME -- the DB falls behind but people can query it
NIGHT -- I copy over the day's worth of WAL files and get the DB caught up
Note: the key here is that I only need to copy a full DB snapshot one time. Thereafter I should only have to copy a day's worth of WAL files in order to get the remote slave caught up again.
Since I haven't done log-shipping before I'd like some feedback / advice.
Will this work? Does PostgreSQL support this kind of repeated recovery?
Do you have other suggestions for how to set up a remote semi-fresh read-only slave?
thanks!
--S
Your plan should work.
As Charles says, warm standby is another possible solution. It's supported since 8.2 and has relatively low performance impact on the primary server.
Warm Standby is documented in the Manual: PostgreSQL 8.4 Warm Standby
The short procedure for configuring a
standby server is as follows. For full
details of each step, refer to
previous sections as noted.
Set up primary and standby systems as near identically as possible,
including two identical copies of
PostgreSQL at the same release level.
Set up continuous archiving from the primary to a WAL archive located
in a directory on the standby server.
Ensure that archive_mode,
archive_command and archive_timeout
are set appropriately on the primary
(see Section 24.3.1).
Make a base backup of the primary server (see Section 24.3.2), and load
this data onto the standby.
Begin recovery on the standby server from the local WAL archive,
using a recovery.conf that specifies a
restore_command that waits as
described previously (see Section
24.3.3).
To achieve only nightly syncs, your archive_command should exit with a non-zero exit status during daytime.
Additional Informations:
Postgres Wiki about Warm Standby
Blog Post Warm Standby Setup
9.0's built-in WAL streaming replication is designed to accomplish something that should meet your goals -- a warm or hot standby that can accept read-only queries. Have you considered using it, or are you stuck on 8.4 for now?
(Also, the upcoming 9.1 release is expected to include an updated/rewritten version of pg_basebackup, a tool for creating the initial backup point for a fresh slave.)
Update: PostgreSQL 9.1 will include the ability to pause and resume streaming replication with a simple function call on the slave.