pg_create_logical_replication_slot hanging indefinitely due to old walsender process - postgresql

I am testing logical replication between 2 PostgreSQL 11 databases for use on our production (I was able to set it thanks to this answer - PostgreSQL logical replication - create subscription hangs) and it worked well.
Now I am testing scripts and procedure which would set it automatically on production databases but I am facing strange problem with logical replication slots.
I had to restart logical replica due to some changes in setting requiring restart - which of course could happen on replicas also in the future. But logical replication slot on master did not disconnect and it is still active for certain PID.
I dropped subscription on master (I am still only testing) and tried to repeat the whole process with new logical replication slot but I am facing strange situation.
I cannot create new logical replication slot with the new name. Process running on the old logical replication slot is still active and showing wait_event_type=Lock and wait_event=transaction.
When I try to use pg_create_logical_replication_slot to create new logical replication slot I get similar situation. New slot is created - I see it in pg_catalog but it is marked as active for the PID of the session which issued this command and command hangs indefinitely. When I check processes I can see this command active with same waiting values Lock/transaction.
I tried to activate parameter "lock_timeout" in postgresql.conf and reload configuration but it did not help.
Killing that old hanging process will most likely bring down the whole postgres because it is "walsender" process. It is visible in processes list still with IP of replica with status "idle wating".
I tried to find some parameter(s) which could help me to force postgres to stop this walsender. But settings wal_keep_segments or wal_sender_timeout did not change anything. I even tried to stop replica for longer time - no effect.
Is there some way to do something with this situation without restarting the whole postgres? Like forcing timeout for walsender or lock for transaction etc...
Because if something like this happens on production I would not be able to use restart or any other "brute force". Thanks...
UPDATE:
"Walsender" process "died out" after some time but log does not show anything about it so I do not know when exactly it happened. I can only guess it depends on tcp_keepalives_* parameters. Default on Debian 9 is 2 hours to keep idle process. So I tried to set these parameters in postgresql.conf and will see in following tests.

Strangely enough today everything works without any problems and no matter how I try to simulate yesterday's problems I cannot. Maybe there were some network communication problems in the cloud datacenter involved - we experienced some occasional timeouts in connections into other databases too.
So I really do not know the answer except for "wait until walsender process on master dies" - which can most likely be influenced by tcp_keepalives_* settings. Therefore I recommend to set them to some reasonable values in postgresql.conf because defaults on OS are usually too big.
Actually we use it on our big analytical databases (set both on PostgreSQL and OS) because of similar problems. Golang and nodejs programs calculating statistics from time to time failed to recognize that database session ended or died out in some cases and were hanging until OS ended the connection after 2 hours (default on Debian). All of it seemed to be always connected with network communication problems. With proper tcp_keepalives_* setting reaction is much quicker in case of problems.
After old walsender process dies on master you can repeat all steps and it should work. So looks like I just had bad luck yesterday...

Related

Prevent Data Loss during PostgreSQL Host Shutdown

So I've spent the better part of my day (and several searches before) looking for a workable solution to prevent data loss when the host of a PostgreSQL server installation gets rebooted or shut down. We maintain a number of Azure and on-prem servers and the number of times someone has inadvertently shut down the server without first ensuring Postgres is no longer flushing data to disk is far more frequent than it should be. Of note we are a Windows Server shop.
Our current best practice (which if followed appropriately works) is to stop the Postgres service, then watch disk writes to the Postgres data directory in Resource Monitor. Once nothing is writing to that directory, shut down the host. I have to think that there's a better way to ensure that it doesn't get shutdown in a manner that leads to data corruption, regardless of adherence to the best practice (or in some cases, because Windows Update mandates a reboot, regardless of configured settings telling it not to reboot).
Some things I've considered, but have been unable to find solid answers for:
Create a scheduled task that uses the "On an event" trigger to monitor the System log for event 1074. It would have to be configured to "run whether the user is logged in or not". The script would cancel the shutdown command with shutdown /a, then run a script to elegantly shutdown Postgres. I've seen mixed results on if the scheduled job would reliably trigger before Task Scheduler is terminated in the shutdown sequence.
Create a shutdown script using Group Policy. My question there is will it wait for the script to complete before executing the shutdown?
How do you deal with data loss in your Postgres server Windows hosts?
First, if you register PostgreSQL as a Windows service, a shutdown of the machine will automatically shut down PostgreSQL first.
But even without that, a properly configured PostgreSQL server on proper hardware will never suffer data loss (unless you hit a rare PostgreSQL software bug). It is one of the basic requirements for a relational database to survive crashes without data loss.
To enumerate a few things that come to mind:
make sure that the PostgreSQL parameters fsync and synchronous_commit are set to on
make sure that you are using a reliable file system for the data files and the WAL (a Windows network share is not a reliable file system)
make sure you are using storage that has no caches that are not battery-backed

Mongodb localhost:27017 replica set switched to secondary localhost:27027

I'm fairly new to mongodb, just a couple of months. I just converted my mongodb database to support a secondary replica set so I can watch collections. I only added one secondary which I'm guessing now may not be the best after reading you should create an odd number, but it is a localhost on one machine. I went through the instructions, got replication working fine for for half a day running my programs. But for some reason recently it has switched the database for port 27017 from primary to secondary. Primary was previously on localhost:27017 and secondary was on localhost:27027. Now my normal program can't connect to localhost:27017 without an error, which I believe it is because it is a secondary replica set now when it was primary before, assuming you can only connect to a primary. Here is the error msg.
Exception in thread "main" com.mongodb.MongoNotPrimaryException: Command failed with error 10107 (NotWritablePrimary): 'not master' on server localhost:27017.
I'm perplexed why mongodb switched the replica set primary in the first place. I doubt an error occurred, but certainly possible, but I haven't had a single "localhost" error in months of development.
For now, ideally how would I switch 27017 back to be the primary. How do I do that so my existing programs can function again?
Eventually when in production, what is the best methodology to handle this, assuming a lookup to a DNS entry to an ip address and suddenly the primary gets changed because of a fail over?
Given question 3 is a bit more involved, is there something I can do in my development environment to better simulate a production environment.
I use StackOverflow extensively but this is my first post so thanks for anyone who can provide advice.
Without knowing more about the replica configuration and circumstances of the switch over I'm not sure anyone could confidently answer question 1 but it may not be important compared to question 3.
When you want to manually switch the primary you can manipulate the priority settings:
https://docs.mongodb.com/manual/tutorial/force-member-to-be-primary/
Or run manual commands to freeze or step down the current primary:
https://docs.mongodb.com/manual/tutorial/force-member-to-be-primary/#force-a-member-to-be-primary-using-database-commands
The safest option is to ensure your application is aware of all replicas in the replica set. Then when you have these situations where something unexpected has happened the application will fail over to a writable db without any issues.
https://mongodb.github.io/mongo-java-driver/3.4/driver/tutorials/connect-to-mongodb/#connect-to-a-replica-set
I can only suggest setting up some VMs or containers as replica set members to better represent a production environment.
https://hub.docker.com/_/mongo
I was able to solve the problem by using the connect string which comprised both replica sets, which I was unaware I needed to do. Such as for java:
mongoClient = MongoClients.create("mongodb://localhost:27017,localhost:27027");
This also worked for Mongo Compass so I was able to connect to the secondary database. I didn't know you needed to provide paths to all replica sets when trying to connect, but in retrospect makes goods sense if something is down.
If you need a replica set for testing, you can create a single-node RS. Follow the instructions for creating a RS but only add one node.

What to do when a Google Cloud SQL postgres server upgrade (to a bigger machine) takes considerably longer than "a few minutes"?

We upgraded our Google Cloud SQL postgres server to a bigger machine and the upgrade is not terminating. In our experience, this usually takes less than 5 minutes, but we'ven been waiting for about 1.5 hours now and nothing is happening. There are no logs after the server shut down(except for failed connection attempts). We cannot switch to the failover, because there is already an operation in progress (namely the upgrade that's causing the problem in the first place). Restarting is disabled because the operation is in progress. It seems like there's nothing we can do right now, except maybe apply the last backup, though we're not sure if that's even possible while an operation is in progress.
Is there anything we can do to restart the DB or fix the problem?
When you upgrade a CloudSQL server, the instance is rebooted. It can happen occasionally that rebooting takes more than expected, which seems to be what happened to your server, but this is not unexpected behaviour.
This being said, be sure to check the status of the CloudSQL service. And if upgrades get stuck too often or never finish, contact support.
To reduce the chances of having this issue again:
Configure High Availability for your instance, so it has failover capability.
Make sure that the maintenance window of failover replicas is different from that of the master instance. To change the maintenance schedule, on the GCP console, go to SQL, click on an instance, and "Edit maintenance schedule"->"Set maintenance schedule". Then choose a window.

Slow replication recovery due to communication problems

We had lately several times the same problems on Google compute engine environment with PostgreSQL streaming replication and I would like to understand reasons and if I can repair it in some smoother way.
From time to time we see some communication problems in Google's internal network in GCE datacenter and they always trigger replication lags between our PG master and its replicas. All machines are Debian-8 and PostgreSQL 9.5.
When situation happens everything seems to be OK - no errors in PG logs on master or replicas just communication between master and replicas seems to be incredibly slow or repeatedly failing so new WAL logs are transfered to replicas with big delays and therefore replication lag is still growing.
Restart of replication from within PostgreSQl or restart of PostgreSQL on replica does not really help - after several WAL logs copied using scp in recovery command communication is back in previous incredibly slow status. Only restart of the whole instance help. When whole VM is restarted communication is back to normal and recovery even from lag many hours long is done in a few minutes. So main reason for this behavior seems to be on OS level. I tried to check net traffic but without finding anything significant. I also do not see anything relevant in any OS log.
Could restart of some OS service help? So I do not need to restart the whole VM? Thank you very much for any ideas.

Which PostgreSQL replication solution to use for my specific scenario

I need to replicate a PostgreSQL database server as follows:
Two servers are adjacent to each-other - one is the master and the other standby. If the master fails, the standby takes over. Replication from master to slave needs to be failsafe, hence, synchronous. The standby will not be used for any querying unless it has become a master. So, no high-availability/load-balancing is required.
There is another backup server at a remote location. Data from the master server mentioned above will be replicated to this remote server asynchronously and in batches. Time is not a factor at all in this replication - a couple of hours is just fine. This server would be used just for backup.
I've studied the currently available replication solutions from the PostgreSQL docs as well as from Google, but can't decide which combination of synchronous-asynchronous solutions would I need.
The closest I came up with is using pgpool-II for scenario 1 and Mammoth for scenario 2. However, as pgpool is statement-based, what would happen to queries containing rand() and now()?
Please note that I'd rather use free and open-source replication tools.
Also, just a side question - according to scenario 1 above, when the master fails, the standby will take over. Would the master-slave role be reversed after that, or would after the recovery of the master server the slave would go back to its standby state?
Any suggestion would be highly appreciated. Thanks.
I suggest using DRBD for scenario 1 and either 9.0 built-in replication or Slony for scenario 2.
Before PostgreSQL 9.1 (not yet released), there is no other synchronous replication solution available, and DRBD is widely established for this purpose. Together with Pacemaker or Heartbeat, which come with all the scripts needed for PostgreSQL monitoring and switchover, you have a very robust and fairly easy to manage solution. (In fact, I'd consider continuing to use DRBD even after 9.1 comes out; it's just a lot easier and has a longer track record.)
For the cross-site asynchronous, you could try the built-in replication of PostgreSQL 9.0, perhaps in conjunction with repmgr for monitoring and management. Alternatively, you could try the (now a bit) old-school Slony, but I'd guess it will more complicated for your needs.
You didn't mention if the server in question was on a specific version or if this was a new project with the freedom to choose the version. The answers vary based on that information.
If you are starting with a clean slate, I would recommend designing based on the PostgreSQL 9.1 beta. The final version will be released long before you would be ready to go into a production environment and it has binary synchronous replication built-in.
I've been using the built-in asynchronous replication in PostgreSQL for years in almost the exact same scenario you describe and it has always been rock-solid for me. It's become even better with 9.0 with Hot standby and it's become much easier to configure and maintain. 9.1 provides the only missing piece you require.
However, if you are trying to replicate an existing server, built-in asynchronous replication with aggressive settings for "checkpoint_timeout" a very frequent backup of unarchived WAL files could be sufficient until you can upgrade to 9.1.
The bottom line here is that you can get exactly what you want is with stock PostgreSQL 9.1--no third-party products required.
As for failover, it is not an automatic process, you'll need to handle that yourself. I would recommend that after a failover, switching the roles of the two machines until either the next failover event or until a controlled manual failover during a scheduled outage during a slow period of use. Again, this is not automatic and much be managed by the administrator (via shell scripts, presumably).