I have enabled WAL archiving in postgres configuration file. When I restarted Postgres service WAL recovery is not working. There was no wal recovery entries in logs.
Steps I followed:
Created directory for wal:
mkdir -p /var/lib/pgsql/wals/
mkdir -p /var/lib/pgsql/backups/
chown postgres:postgres -R /var/lib/pgsql/backups/
chown postgres:postgres -R /var/lib/pgsql/wals/
Edited the postgresql.conf with the below changes:
wal_level=archive
archive_mode=on
archive_command = 'test ! -f /var/lib/pgsql/wals/%f && cp %p /var/lib/pgsql/wals/%f'
sudo service postgresql restart 10
sudo su - postgres
pg_basebackup -D /var/lib/pgsql/data #created base backup
tar -C /var/lib/pgsql/data/ -czvf /var/lib/pgsql/backups/pg_basebackup_backup.tar.gz .
Deleted two rows of data in my database and stopped the postgres service
sudo service postgresql stop 10
Extracted the Basebackup
tar xvf /var/lib/pgsql/backups/pg_basebackup_backup.tar.gz -C /var/lib/pgsql/data
Created recovery.conf with the below content and restarted postgres service
echo "restore_command = 'cp /var/lib/pgsql/wals/%f %p'">/var/lib/pgsql/recovery.conf
cp /var/lib/pgsql/recovery.conf /var/lib/pgsql/data/
sudo service postgresql stop 10
sudo service postgresql start 10
There was no wal recovery entries in logs and the two rows which I deleted didn't get restored.
pg_basebackup can catch all WAL segments during backup. I use basebackup into tar format with "-X stream" option and all works well. See here - pg_basebackup – bash script for backup and archiving on Google storage
It works excellent - I backup database 4.5+ TB big which takes almost 2 days.
Restoration is described here - pg_basebackup / pg-barman – restore tar backup
All works - we already had incidents when we had to restore from these backups.
The WAL containing the two deleted rows probably hasn't been archived yet.
archive_command is run whenever a 16 MB WAL segment is full or something forces a log switch.
To be able to recover the latest changes that have not been archived yet, you have to copy the contents of pg_wal to the pg_wal directory in the restored base backup.
You can also consider streaming replication or pg_receivewal if you cannot afford to lose transactions.
Related
We are using below command to take the backup of the database.
$PGHOME/bin/pg_basebackup -p 5433 -U postgres -P -v -x --format=tar --gzip --compress=1 --pgdata=- -D /opt/rao
While taking the backup we have received below error.
transaction log start point: 285/8F000080
pg_basebackup: could not get transaction log end position from server: FATAL: requested WAL segment 00000001000002850000008F has already been removed
Please guide me why and how to handle this error. Do you want me to change any of the option in my pg_basebackup command let me know.
Please clarify me what it means --pgdata=--D in my above pg_basebackup command.
-D directory
--pgdata=directory
This specifies the directory to write the output to.
When the backup is in tar mode, and the directory is specified as - (dash), the tar file will be written to stdout. This parameter is required.
FATAL: requested WAL segment 00000001000002850000008F has already been removed
means that the master hasn't kept enough history to bring the standby back up to date.
You can use pg_basebackup to create a new slave:
pg_basebackup -h masterhost -U postgres -D path --progress --verbose -c fast
When having a WAL archive, you can try restore_command. The pg_basebackup creates an entirely new slave in an empty directory.
I'm work with postgresql over large datasets and I move my database from the same HDD were is Ubuntu 18.04 installed to a other HDD.
I did the same process and move the data_directory to "Home" (which is at same HDD of the Ubuntu, but in other partition), in this location the postgresql works fine. The problem is I need a big space to upload my data (like 2 TB HDD) because that I try to move to another HDD, empty and formmated exclusive for the DB.
I follow to this tutorial Link of change Data Folder. Using the follow commands in Ubuntu terminal:
su postgres
/usr/lib/postgresql/10/bin/pg_ctl -D /media/path/postgresql/10/main -l logfile start
sudo systemctl stop postgresql
sudo systemctl status postgresql
sudo rsync -av /var/lib/postgresql /media/path/postgresql
sudo mv /var/lib/postgresql/10/main /var/lib/postgresql/10/main.bak
sudo gedit /etc/postgresql/10/main/postgresql.conf (change "data_directory")
sudo systemctl start postgresql
sudo systemctl status postgresql
In addition, I pass to the postgres the ownership of the folder as
chown -R postgres:postgres /media/path/postgresql/10/main
I write the modifications in "/path/postgresql.conf" to adjust to new data directory, reestart the server and I can't connect to server.
At terminal I get this message if I try to connect
"could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?"
At Pgadmin client I get the problem is manifested as no connection.
In my case, rsync did not copied several files for some reason. May be they were not ready yet after the server stopped or something else.
After I made it once more next day, it started well:
sudo rsync -av /var/lib/postgresql/10/main.bak/ /media/disk1/postgres/postgresql/10/main/
Also try to check log for the actual reason:
tail -n 100 /var/log/postgresql/postgresql-10-main.log
At my case it was
postgres: could not find the database system
Expected to find it in the directory "/media/disk1/postgres/postgresql/10/main",
but could not open file "/media/disk1/postgres/postgresql/10/main/global/pg_control": No such file or directory
pg_ctl: could not start server
Examine the log output.
I have a DB with 150GB of data's . Am using MongoDump and Mongorestore method to back and restore.
My Production server is running with Mongo 2.2 and Test server is with 2.6.1
When i take a back up from production server Mongo2.2 its taking long time to complete the back up for 150GB of data. And restoration take 6-8 Hours. its not completed without error, some times restore is dropped automatically and we need to run the restore again or restore the missed collection.
is there a best way to to take a backup and restore method, where we can save time and run it without Errors?
Regards,
Rishi
You can couple of options for native backup and restore functionality and these are listed very well in the documentation at http://docs.mongodb.org/manual/administration/backup/.
Just to summarize, as your data grows, mongodump / mongorestore becomes less ideal for backup / restore purpose and you should start looking at other options like:
File system based snapshots or LVM snapshots (Since you are in EC2, this should be fairly straight forwards)
MMS Backup
The best method is to Backup and Restore Using LVM on a Linux System.
Creating a Snapshot:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb
Archive a Snapshot:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | gzip > mdb-snap01.gz
Restore a Snapshot:
lvcreate --size 1G --name mdb-new vg0
gzip -d -c mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Restore Directly from a Snapshot:
umount /dev/vg0/mdb-snap01
lvcreate --size 1G --name mdb-new vg0
dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Remote Backup Storage:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | ssh username#example.com gzip > /opt/backup/mdb-snap01.gz
lvcreate --size 1G --name mdb-new vg0
ssh username#example.com gzip -d -c /opt/backup/mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
I want to create a database with the data files and the wal on different filesystems. I want the wal on a separate server over NFS, to avoid a loss of data in case of a fs/disk crash.
Where is the wal written?
Can I force it to a different location than the default via the configuration?
I'm on 9.1 if that matters.
Thanks.
The WAL files are written to the directory pg_xlog inside of the data directory. Starting with Postgres 10, this directory was renamed to pg_wal
E.g. /var/lib/postgresql/10/main/pg_wal
See the manual for details:
http://www.postgresql.org/docs/9.1/static/wal-configuration.html
http://www.postgresql.org/docs/current/static/wal-configuration.html
If I'm not mistaken, this directory name can not be changed. But it can be a symbolic link that points to a different disk.
As a matter of fact this is actually recommended to tune WAL performance (See here: http://wiki.postgresql.org/wiki/Installation_and_Administration_Best_practices#WAL_Directory)
To Copy the WAL Directory to another file path/disk drive, follow these steps below:
Descriptive Steps
Turn off Postgres to protect against corruption
Copy WAL directory (by default on Ubuntu - /var/lib/postgresql/<version>/main/pg_wal) to new file path using rsync. It will preserve file/folder permissions and folder structure with the -a flag. You should leave off the training slash.
Verify the contents copied correctly
Rename pg_wal to pg_wal-backup in the Postgres data directory ($PG_DATA)
Create a symbolic link to the new path to pg_wal in the Postgres data directory ($PG_DATA) and update the permissions of the symbolic link to be the postgres user
Start Postgres and verify that you can connect to the database
Optionally, delete the pg_wal-backup directory in the Postgres data directory ($PG_DATA)
Matching Commands
sudo service postgresql stop
sudo rsync -av /var/lib/postgresql/12/main/pg_wal /<new_path>
ls -la /<new_path>
sudo mv /var/lib/postgresql/12/main/pg_wal /var/lib/postgresql/12/main/pg_wal-backup
sudo ln -s /<new_path> /var/lib/postgresql/12/main/pg_wal
sudo chown -h postgres:postgres /var/lib/postgresql/12/main/pg_wal
sudo service postgresql start && sudo service postgresql status
# Verify DB connection using your db credentials/information
psql -h localhost -U postgres -p 5432
rm -rf /var/lib/postgresql/12/main/pg_wal-backup
Due to a sudden power outage, the Postgres server running on my local machine shut down abruptly. After rebooting, I tried to restart Postgres and I get this error:
$ pg_ctl -D /usr/local/pgsql/data restart
pg_ctl: PID file "/usr/local/pgsql/data/postmaster.pid" does not exist
Is server running?
starting server anyway
server starting
$:/usr/local/pgsql/data$ LOG: database system shutdown was interrupted at 2009-02-28 21:06:16
LOG: checkpoint record is at 2/8FD6F8D0
LOG: redo record is at 2/8FD6F8D0; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 0/1888104; next OID: 1711752
LOG: next MultiXactId: 2; next MultiXactOffset: 3
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 2/8FD6F918
LOG: record with zero length at 2/8FFD94A8
LOG: redo done at 2/8FFD9480
LOG: could not fsync segment 0 of relation 1663/1707047/1707304: No such file or directory
FATAL: storage sync failed on magnetic disk: No such file or directory
LOG: startup process (PID 5465) exited with exit code 1
LOG: aborting startup due to startup process failure
There is no postmaster.pid file in the data directory. What possibly could be the reason for this sort of behavior and of course what is the way out?
You'd need to pg_resetxlog. Your database can be in an inconsistent state after this though, so dump it with pg_dumpall, recreate and import back.
A cause for this could be:
You have not turned off hardware
write cache on disk, which often
prevents the OS from making sure data is written before it reports successful write to application. Check with
hdparm -I /dev/sda
If it shows "*" before "Write cache" then this could be the case. Source of PostgreSQL has a program src/tools/fsync/test_fsync.c, which tests speed of syncing data with disk. Run it - if it reports all times shorter than, say, 3 seconds than your disk is lying to OS - on a 7500rpm disks a test of 1000 writes to the same place would need at least 8 seconds to complete (1000/(7500rpm/60s)) as it can only write once per route. You'd need to edit this test_fsync.c if your database is on another disk than /var/tmp partition - change
#define FSYNC_FILENAME "/var/tmp/test_fsync.out"
to
#define FSYNC_FILENAME "/usr/local/pgsql/data/test_fsync.out"
Your disk is failing and has a bad block, check with badblocks.
You have a bad RAM, check with memtest86+ for at least 8 hours.
Reading a few similar messages in the archives of the PostgreSQL
mailing list ("storage sync failed on magnetic disk: No such file or
directory") seems to indicate that there is a very serious hardware
trouble, much worse than a simple power failure. You may have to prepare yourself to restore from backups.
Had db corruption too, my actions
docker run -it --rm -v /path/to/db:/var/lib/postgresql/data postgres:10.3 bash
su - postgres
/usr/lib/postgresql/10/bin/pg_resetwal -D /var/lib/postgresql/data -f
I had this same problem and I was about to dump, reinstall and import from db dump (a really painfull process), however I just tried this as the last resource and it worked!
brew services start postgresql
Then I restarted and that was it.
Run start instead of restart.
Execute the below command:
$pg_ctl -D /usr/local/pgsql/data start
Had this problem a couple of times, when my laptop turned off unexpectedly, when on very low battery while running PSQL in the background.
My solution after searching all over was, Hard delete and Reinstall, then import data from db dump.
Steps for Mac with brew to uninstall and reinstall psql 9.6
brew uninstall postgresql#9.6
rm -rf rm -rf /usr/local/var/postgresql#9.6
rm -rf .psql.local .psql_history .psqlrc.local l.psqlrc .pgpass
brew install postgresql#9.6
echo 'export PATH="/usr/local/opt/postgresql#9.6/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile
brew services start postgresql#9.6
createuser -s postgres
createuser {ENTER_YOUR_USER_HERE} --interactive
As others stated, a stop + start instead of a restart worked for me. In a Docker environment this would be:
docker stop <container_name>
docker start <container_name>
or when using Docker Compose:
docker-compose stop
docker-compose start