PostgreSQL – waiting for checkpoint to complete - postgresql

When trying to perform pg_basebackup on a replica, I always get the following message:
postgres#db1:~/10$ pg_basebackup -h foo.bar.com -U repluser -D /var/lib/postgresql/10/main -v -P
pg_basebackup: initiating base backup, waiting for checkpoint to complete
I've tried waiting, but nothing happens. Is it possible to speed up the process?

Call pg_basebackup with the option --checkpoint=fast to force a fast checkpoint rather than waiting for a spread one to complete.

It's possible to force a checkpoint to complete. To do so, run CHECKPOINT; on the master server:
$ sudo su - postgres
$ psql
postgres=# CHECKPOINT;

Wanted to point out that while the accepted answer of running 'CHECKPOINT' command on the primary server is correct, it is not meant to be run during normal operation, this is according to postgres documentation which you can see here:
https://www.postgresql.org/docs/current/sql-checkpoint.html
So be sure and not do this while the server is processing normal operations from your apps etc.

Related

PostgreSQL server fails to start on ArchLinux: FATAL: could not create lockfile »/run/postgresql/.s.PGSQL.5432.lock«

I am quite new in Arch and a total beginner in PostgreSQL, so this may be a very basic question.
I installed postgresql 11.5-4 from extra and pgadmin 4 from AUR, both seem to be running well.
I created a test DB with the following command:
initdb -D /home/lg/test-db
I got the answer:
You can start the db-server using:
pg_ctl -D /home/lg/test-db -l logdatei start
I tried that and got:
pg_ctl -D /home/lg/test-db -l logdatei start
waiting for serer to start.... stopped
pg_ctl: could not start the server
check the log.
The log only says that the lockfile »/run/postgresql/.s.PGSQL.5432.lock« could not be created, because the folder could not be found. Under /run is no folder called "postgresql". I suppose postgresql can not create this folder, because it does not have the permission. Several posts online posts suggest to change the user/owner of the db to sudo, however. Postgresql prevents this, however. When I try any command as sudo, postgresql tells me that this command can't be run as root. There must be some very basic error in my thinking here, but I have not worked it out for 3 hours.
You'll have to remove /run/postgresql from unix_socket_directories in postgresql.conf before starting the server.
Probably You have /var/run symlinked to /run and run is on tmpfs. You should add something like d /run/postgresql 0755 postgres postgres - into /usr/lib/tmpfiles.d/postgresql.conf

Correct way to terminate a postgres replication mode connection

I have the following code that enables logical decoding in PostgreSQL 9.4:
pg_recvlogical -h localhost --slot test_slot --create-slot
pg_recvlogical -h localhost --slot test_slot --start -f -
I spawn a node.js subprocess to run this code and listen for changes however I'm unsure of what the correct procedure is to terminate the connection. I usually just CNTRL+C from the command line or kill the subprocess in code but I always get a pg_recvlogical: unexpected termination of replication stream: error. What is the correct way to terminate this connection?
The PostgreSQL documentation says,
--start
Begin streaming changes from the logical replication slot specified by --slot, continuing until terminated by a signal. If the server side change stream ends with a server shutdown or disconnect, retry in a loop unless --no-loop is specified.
So I assume the standard Linux signals meanings apply. Therefore, SIGINT or SIGQUIT should be acceptable, giving the app time to finish.
What I do is I send SIGINT, then enter a loop, check if /proc/$PID_/stat exists, and after custom timeout, I send it SIGKILL, and wait $PID_ for it.
kill -2 $PID_
for i in `seq 1 8`; do
if [ ! -f /proc/$PID_/stat ] ; then
RETURN_CODE=`wait $PID_`;
break;
fi
sleep 1;
done
if [ -f /proc/$PID_/stat ] ; then
kill -9 $PID_;
RETURN_CODE=`wait $PID_`;
fi
This is Bash, but any decent platform/language will have tools to do it.
Shortened, so there may be a bug :)
I did the following to kill a zombie active replication slot.
connect to the server where postgres is installed using CLI with ssh.
ssh -i <generated_security_key_name/path>.pem <username>#<hostIp>
once connected with the VM in your server. type following command to connect to postgres sql.
sudo -i -u <postgres-username>
you can go with psql which is a terminal-based front-end to PostgreSQL
psql
check the pid against your replication slot.
select * from pg_replication_slots;
Enter the following to kill it.
sudo kill <pid>

Automating pg_basebackup command for recovery using cron and script

I have a 2 node system running pacemaker, corosync and postgresql 9.4. I am carrying out pgsql replication using virtual IP and am able to successfully recover a downed machine using manual commands. Now to automate stuff, I want to run the following commands in a script to get my recovered master back on cluster.
`#su -postgres
$rm -rf /var/lib/pgsql/9.4/data/* //To delete old data files
$pg_basebackup -h 192.XX.XX.XX -U postgres -D /var/lib/pgsql/9.4/data -X stream -P // To recover the latest data from standby PC running latest entries
$rm /var/lib/pgsql/tmp/PGSQL.lock
$exit //exit from postgresql shell
#pcs resource cleanup msPostgresql`
Now when i run these commands as a script, it hangs after the first command itself i.e. su -postgres and the cursor blinks at bash$ syntax without inserting the commands down below.
I want to automate this process using cron but the script itself is not working for me. Can someone help me out here.
Regards
As far as I know "su -postgres" is wrong. You can use either "su postgres", or "sudo -i -u postgres"
Regarding the scripts here you can find tested, working scripts. The one you are interested in is called "initiate_replication.sh" there.

Unable to start Postgres Server because of permission denied on lock file

I restarted my Postgres server but now.
I checked my "pgstartup.log" log file. This says:
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
Success. You can now start the database server using:
/usr/bin/postgres -D /var/lib/pgsql/data
/usr/bin/pg_ctl -D /var/lib/pgsql/data -l logfile start
FATAL: could not open lock file "/tmp/.s.PGSQL.5432.lock": Permission denied
FATAL: could not open lock file "/tmp/.s.PGSQL.5432.lock": Permission denied
Do you think deleting /tmp/.s.PGSQL.5432.lock would work ?
Postgres cannot write a file to /tmp because of permissions set on /tmp directory.
As root user execute in terminal:
chmod 1777 /tmp
PostgreSQL normally deletes the lock file when terminates correctly.
It is probably due to another PostgreSQL instance running with a different user that has been terminated abnormally (a kill -9 to postmaster).
So, if you are sure no Postgres processes are running, you can probably delete that file without any issue. You should also check with the ipcs command if there is any stale shared memory segment, and in that case delete it with ipcrm.
Probably the be best way to address all these things at once is rebooting the server.
P.S.: never kill -9 any PostgreSQL process.
It looks like you probably have another PostgreSQL instance running on the same port as a different user, or you previously started this PostgreSQL instance as a different user then stopped it uncleanly.
Check the ownership of /tmp/.s.PGSQL.5432.lock:
ls -l /tmp/.s.PGSQL.5432.lock
Does it match the user you're running PostgreSQL as?
It's relatively harmless to delete the lock files in /tmp/. (Never, ever delete the postmaster.pid lock file though). If the other PostgreSQL instance is still running you'll lose the ability to connect to it over a unix socket, or you might get an error about being unable to bind to port 5432 on tcp.
I agree with #mnencia that a server reboot is the best option if it's easy and practical.
If you know that no other Postgres processes are running, please delete these 2 files and try again:
$ sudo rm /tmp/.s.PGSQL.5432.lock
$ sudo rm /tmp/.s.PGSQL.5432
Then, you would be able to run the server as a background process with the command:
$ pg_ctl -D /usr/local/var/postgres start
If you are in OS X, put the alias in the.bash_profile as below:
alias pgb='pg_ctl -D /usr/local/var/postgres start'
Now, source it with the command:
$ source ~/.bash_profile
The Postgres server will run with the command:
$ pgb
To me it was permission problem for database file, It was group/world readable. And that was wrong! Database file should be 0700. Aft
Thanks for the suggestions.
First I tried changing the permission of the lock file but it didn't work
Later I deleted the lock file which resolved my issue.
Thanks
Thanks.
I was trying to install postgres on my mac.I was receiving
FATAL: could not open lock file "/tmp/.s.PGSQL.5432.lock": Permission denied
After deleting the /tmp/.s.PGSQL.5432.lock file, the server started working.
Yes, I had the same issue, and I fixed it by running this command
$ sudo rm /tmp/.s.PGSQL.5432.lock
$ sudo rm /tmp/.s.PGSQL.5432
$ pg_ctl -D /usr/local/var/postgres start
I killed the Postgress service running on port 5432 which is something you should never do, this can be done unknowingly. So running the above commands would remove the lock file, and now when you start a new Postgres server, it would create a new process for you
Not sure, but I think it's the lock file that another Postgres instance is working on.
Restart the PC, and then that instance will leave access to that lock file. And the Postgres server will start. It works for me.

How do I fix Postgres so it will start after an abrupt shutdown?

Due to a sudden power outage, the Postgres server running on my local machine shut down abruptly. After rebooting, I tried to restart Postgres and I get this error:
$ pg_ctl -D /usr/local/pgsql/data restart
pg_ctl: PID file "/usr/local/pgsql/data/postmaster.pid" does not exist
Is server running?
starting server anyway
server starting
$:/usr/local/pgsql/data$ LOG: database system shutdown was interrupted at 2009-02-28 21:06:16
LOG: checkpoint record is at 2/8FD6F8D0
LOG: redo record is at 2/8FD6F8D0; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 0/1888104; next OID: 1711752
LOG: next MultiXactId: 2; next MultiXactOffset: 3
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 2/8FD6F918
LOG: record with zero length at 2/8FFD94A8
LOG: redo done at 2/8FFD9480
LOG: could not fsync segment 0 of relation 1663/1707047/1707304: No such file or directory
FATAL: storage sync failed on magnetic disk: No such file or directory
LOG: startup process (PID 5465) exited with exit code 1
LOG: aborting startup due to startup process failure
There is no postmaster.pid file in the data directory. What possibly could be the reason for this sort of behavior and of course what is the way out?
You'd need to pg_resetxlog. Your database can be in an inconsistent state after this though, so dump it with pg_dumpall, recreate and import back.
A cause for this could be:
You have not turned off hardware
write cache on disk, which often
prevents the OS from making sure data is written before it reports successful write to application. Check with
hdparm -I /dev/sda
If it shows "*" before "Write cache" then this could be the case. Source of PostgreSQL has a program src/tools/fsync/test_fsync.c, which tests speed of syncing data with disk. Run it - if it reports all times shorter than, say, 3 seconds than your disk is lying to OS - on a 7500rpm disks a test of 1000 writes to the same place would need at least 8 seconds to complete (1000/(7500rpm/60s)) as it can only write once per route. You'd need to edit this test_fsync.c if your database is on another disk than /var/tmp partition - change
#define FSYNC_FILENAME "/var/tmp/test_fsync.out"
to
#define FSYNC_FILENAME "/usr/local/pgsql/data/test_fsync.out"
Your disk is failing and has a bad block, check with badblocks.
You have a bad RAM, check with memtest86+ for at least 8 hours.
Reading a few similar messages in the archives of the PostgreSQL
mailing list ("storage sync failed on magnetic disk: No such file or
directory") seems to indicate that there is a very serious hardware
trouble, much worse than a simple power failure. You may have to prepare yourself to restore from backups.
Had db corruption too, my actions
docker run -it --rm -v /path/to/db:/var/lib/postgresql/data postgres:10.3 bash
su - postgres
/usr/lib/postgresql/10/bin/pg_resetwal -D /var/lib/postgresql/data -f
I had this same problem and I was about to dump, reinstall and import from db dump (a really painfull process), however I just tried this as the last resource and it worked!
brew services start postgresql
Then I restarted and that was it.
Run start instead of restart.
Execute the below command:
$pg_ctl -D /usr/local/pgsql/data start
Had this problem a couple of times, when my laptop turned off unexpectedly, when on very low battery while running PSQL in the background.
My solution after searching all over was, Hard delete and Reinstall, then import data from db dump.
Steps for Mac with brew to uninstall and reinstall psql 9.6
brew uninstall postgresql#9.6
rm -rf rm -rf /usr/local/var/postgresql#9.6
rm -rf .psql.local .psql_history .psqlrc.local l.psqlrc .pgpass
brew install postgresql#9.6
echo 'export PATH="/usr/local/opt/postgresql#9.6/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile
brew services start postgresql#9.6
createuser -s postgres
createuser {ENTER_YOUR_USER_HERE} --interactive
As others stated, a stop + start instead of a restart worked for me. In a Docker environment this would be:
docker stop <container_name>
docker start <container_name>
or when using Docker Compose:
docker-compose stop
docker-compose start